28
compiler Constreuction compiler Constreuction 1 Chapter 4 Syntax Analysis Chapter 4 Syntax Analysis Topics to cover: Topics to cover: Context-Free Grammars: Context-Free Grammars: Concepts and Notation Concepts and Notation Writing and rewriting a grammar Writing and rewriting a grammar Syntax Error Handling and Recovery Syntax Error Handling and Recovery

Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

  • View
    247

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 11

Chapter 4 Syntax AnalysisChapter 4 Syntax Analysis

Topics to cover:Topics to cover:

Context-Free Grammars: Context-Free Grammars:

Concepts and NotationConcepts and Notation

Writing and rewriting a grammar Writing and rewriting a grammar

Syntax Error Handling and RecoverySyntax Error Handling and Recovery

Page 2: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 22

IntroductionIntroduction

Why CFGWhy CFG CFG gives a precise syntactic specification of a CFG gives a precise syntactic specification of a

programming language.programming language. Automatic efficient parser generatorAutomatic efficient parser generator Enabling automatic translator generator Enabling automatic translator generator Language extension becomes easierLanguage extension becomes easier

The role of the parserThe role of the parser Taking tokens from scanner, parsing, reporting syntax Taking tokens from scanner, parsing, reporting syntax

errorserrors Not just parsing, in a syntax-directed translator, the Not just parsing, in a syntax-directed translator, the

parser also conducts type checking, semantic analysis parser also conducts type checking, semantic analysis and IR generation.and IR generation.

Page 3: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 33

Example of CFGExample of CFG

A C– program is made out of functions, a function out A C– program is made out of functions, a function out of declarations and blocks, a block out of statements, a of declarations and blocks, a block out of statements, a statement out of expressions, … etcstatement out of expressions, … etc

<program> <program> <global_decl_list> <global_decl_list><global_decl_list> <global_decl_list> <global_decl_list><global_decl> | e <global_decl_list><global_decl> | e<global_decl> <global_decl> <decl_list> <function_decl> <decl_list> <function_decl><function_decl> <function_decl> <type> id ( <param_list> ) { <block> } <type> id ( <param_list> ) { <block> }<block> <block> <decl_list> <statement_list> | e <decl_list> <statement_list> | e<decl_list> <decl_list> <decl_list> <decl> | <decl> | e <decl_list> <decl> | <decl> | e<decl> <decl> <type_decl> | <var_decl> <type_decl> | <var_decl> <type> <type> void | int | float void | int | float <statement_list> <statement_list> …. ….<statement> <statement> { <block> } { <block> }

Page 4: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 44

Notational ConventionsNotational Conventions

Following symbols are terminalsFollowing symbols are terminals Lower case letters such as a,b,c.Lower case letters such as a,b,c. Operators (+,-, etc) and punctuation symbols Operators (+,-, etc) and punctuation symbols

(parentheses, commas, etc)(parentheses, commas, etc) Digits such as 0,1,2,etcDigits such as 0,1,2,etc Boldface strings such as Boldface strings such as idid or or ifif

Page 5: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 55

Notational ConventionsNotational Conventions

NonterminalsNonterminals Upper case letters such as A,B,CUpper case letters such as A,B,C The letter S – the start symbolThe letter S – the start symbol Lower case italic names such as Lower case italic names such as exprexpr or or stmtstmt

Grammar symbolsGrammar symbols upper case, late in the alphabet, such as X,Y,Z,.upper case, late in the alphabet, such as X,Y,Z,.

Strings of terminals Strings of terminals lower case letters late in the alphabet, such as u,v,.. zlower case letters late in the alphabet, such as u,v,.. z

Strings of grammar symbolsStrings of grammar symbols Lower-case Greek letters, such as Lower-case Greek letters, such as

Page 6: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 66

ExampleExample

expr expr op exprexpr (expr)expr - exprexpr idop +op -op *op /op

Using the notational shorthand

E E A E | (E) | -E | idA + | - | * | / |

Non-terminals: E and AStart symbol: E

Page 7: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 77

DerivationDerivation

Given a string AIfis a production, then we can replace

A by , written as A means derives in one-step+ means derive in one or more steps* means drive in zero or more steps

The language L(G) generated by G is the set of terminal strings w such that S + w. The string w is called a sentence of G.If S * where may contain nonterminals, we say is a sentential form of G

Page 8: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 88

ExerciseExercise

What is a sentence of language L defined What is a sentence of language L defined by the C++ grammar G?by the C++ grammar G?

Is the following string a sentence or a Is the following string a sentence or a sentential form?sentential form?

int parse(<int parse(<parameter_listparameter_list>) {}>) {}

a C++ program

A sentential form

Page 9: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 99

Derivation (cont.)Derivation (cont.)

Consider the following grammar G0

E E + E | E * E | (E) | -E | id

The string -(id + id) is a sentence of G0 because there is a derivationE - E - (E) - (E+E) - (id +E) -(id + id)

Leftmost derivation: only the leftmost nonterminal is replacedRightmost derivation: only the rightmost nonterminal is replaced

Exercise: is id-id a sentence of G0? Is –id+id a sentence?

No Yes

Page 10: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 1010

Parse Tree and DerivationParse Tree and Derivation

A Parse tree can be viewed as a graphical representationfor a derivation that ignore replacement order.

E - E - (E) - (E+E) - (id +E) -(id + id)

E

- E

( E )

E + E

id id

Interior node: non-terminalLeaves: terminalChildren: right-hand side

Page 11: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 1111

CFG is more powerful than RECFG is more powerful than RE

Every RE can be described by a CFGEvery RE can be described by a CFG ExampleExample (a|b)*abb(a|b)*abb

A A aA | bA | abb aA | bA | abb Converting a NFA into a CFGConverting a NFA into a CFG

For each state I of the NFA, create a For each state I of the NFA, create a nonterminal symbol Ainonterminal symbol Ai

If state i goes to stat j on input a, add If state i goes to stat j on input a, add production Ai production Ai aAj aAj

Ai Ai Aj if state i goes to j on eAj if state i goes to j on e Ai Ai e if state i is an accepting state e if state i is an accepting state

Page 12: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 1212

Why do we need RE?Why do we need RE?

RE is sufficiently powerful for lexical rulesRE is sufficiently powerful for lexical rules RE is more concise and easier to understandRE is more concise and easier to understand More efficient lexical analyzer can be More efficient lexical analyzer can be

constructed from RE than from CFGconstructed from RE than from CFG Separating lexical from nonlexical part has a Separating lexical from nonlexical part has a

few advantages such as modularization, easier few advantages such as modularization, easier to port, etc.to port, etc.

Exercise:Exercise: what if we don’t have token definition?what if we don’t have token definition?

Page 13: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 1313

Defects in CFGDefects in CFG

Useless nonterminalsUseless nonterminals S S A | B A | B A A a a B B Bb Bb C C c c

AmbiguityAmbiguity Top-Down parsing issuesTop-Down parsing issues

Left recursionLeft recursion Left factoringLeft factoring

<derives no terminal string><unreachable>

Page 14: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 1414

AmbiguityAmbiguity A grammar is A grammar is ambiguousambiguous if it produces more than one if it produces more than one

parse tree for some sentencesparse tree for some sentences example 1: A+B+C example 1: A+B+C ( is it (A+B)+C or A+(B+C) )( is it (A+B)+C or A+(B+C) )

Improper production: expr Improper production: expr expr + expr | id expr + expr | id

example 2: A+B*C example 2: A+B*C ( is it (A+B)*C or A+(B*C) ) ( is it (A+B)*C or A+(B*C) ) Improper production: expr Improper production: expr expr + expr | expr * expr expr + expr | expr * expr

example 3: example 3: ifif E1 E1 then ifthen if E2 E2 thenthen S1 S1 elseelse S2 S2 (which (which thenthen does the does the elseelse match with) match with) Improper production: Improper production:

stmt stmt if expr then stmt if expr then stmt | if expr then stmt else stmt| if expr then stmt else stmt

Page 15: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 1515

Two parse trees of example 3

stmt

if E1 then stmt

if E2 then S1 else S2

stmt

if E1 then stmt else S2

if E2 then S1

Page 16: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 1616

Eliminating AmbiguityEliminating Ambiguity

Operator AssociativityOperator Associativity expr expr expr + term | term expr + term | term

Operator PrecedenceOperator Precedence expr expr expr + term | term expr + term | term

term term term * factor | factor term * factor | factor

Dangling ElseDangling Else stmt stmt matched | unmatched matched | unmatched

matchedmatched if expr then if expr then matchedmatched else matched else matched

unmatched unmatched if expr then stmt if expr then stmt

| if expr then | if expr then matchedmatched else unmatched else unmatched

Page 17: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 1717

Eliminating Left RecursionEliminating Left Recursion

Immediate left recursionImmediate left recursion Example: A Example: A A A | | TransformationTransformation

A A A A1 | A1 | A | … | | … | | |2 | …2 | …

Where no Where no begins with A, we replace A productions begins with A, we replace A productions byby

A A 1A’ | 1A’ | 2A’ | ….2A’ | ….

A’ A’ 1A’ | 1A’ | 2A’ | … | 2A’ | … |

Page 18: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 1818

Indirect Indirect Left RecursionLeft Recursion Example: Example:

S S Aa | b Aa | bA A Ac | Sd | Ac | Sd |

Transformation (assuming no cycles ATransformation (assuming no cycles A+ A)+ A)1.1. Arrange nonterminals in order A1, A2, … AnArrange nonterminals in order A1, A2, … An2.2. for i := 1 to n dofor i := 1 to n do

for j := 1 to i-1 do beginfor j := 1 to i-1 do begin Replace Ai Replace Ai Aj Ajbybyi i .... where Ajwhere Aj | … are current Aj prod | … are current Aj prod endend Eliminate the immediate left recursion among AiEliminate the immediate left recursion among Aiendend

Page 19: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 1919

In the above example, In the above example, S S Aa | b Aa | bA A Ac | Sd | Ac | Sd |

A A Sd will be replaced by Sd will be replaced byA A Ac | Aad | bd | Ac | Aad | bd | , , then eliminates immediate then eliminates immediate

recursion among A productions and yields the followingrecursion among A productions and yields the following

S S Aa | b Aa | bA A bdA’ | A’ bdA’ | A’A’ A’ cA’ | adA’ | cA’ | adA’ |

Page 20: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 2020

Algorithm 4.1 Eliminating Left RecursionAlgorithm 4.1 Eliminating Left Recursion This algorithm will systematically eliminate left This algorithm will systematically eliminate left

recursions from a grammar. recursions from a grammar. This is about how to remove This is about how to remove indirectindirect left left

recursions.recursions. Precondition: the grammar has no cycles or Precondition: the grammar has no cycles or --

productions. A cycle means: A productions. A cycle means: A + A+ ATo avoid getting A To avoid getting A A type of productions during A type of productions during nonterminal replacement.nonterminal replacement.For example, AFor example, A BA, B BA, B Ab | Ab |

when Awhen ABA is derived to ABA is derived to AAAa cycle shows up.a cycle shows up. -production also makes the algorithm more complex -production also makes the algorithm more complex

because Abecause ABCD may be derived to ABCD may be derived to ACD so CD so handling the leftmost non-terminal only is not sufficienthandling the leftmost non-terminal only is not sufficient

Page 21: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 2121

Indirect Left RecursionIndirect Left Recursion

A A Bb | a Bb | aB B Cc | b Cc | bC C Dd | c Dd | cD D Aa | d Aa | d

A A Bb Bb Ccb Ccb Ddcb Ddcb Aadcb Aadcb C C Dd Dd Aad Aad Bbad Bbad Ccbad Ccbad

Need to expose immediate left recursions and Need to expose immediate left recursions and then eliminate them. Some ordering is needed. then eliminate them. Some ordering is needed. Suppose we replace ASuppose we replace ABb by ABb by A Ccb and Ccb and then start with B then start with B Cc Cc Ddc Ddc Aadc Aadc Ccbabc, this would never expose the Ccbabc, this would never expose the immediate left recursion in this example.immediate left recursion in this example.

Page 22: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 2222

Algorithm 4.1Algorithm 4.1For i:= 1 to n do beginFor i:= 1 to n do beginFor j:= 1 to i-1 do beginFor j:= 1 to i-1 do begin

replace each production of the form replace each production of the form Ai Ai AjAj by bythe productions the productions ii .. .. where where AjAj | … are current | … are current Aj Aj productionproduction

EndEndeliminate the immediate left recursion among Ai-eliminate the immediate left recursion among Ai-productions productions

EndEnd

Key idea:Key idea:For each non-terminal Ai, all references to lower For each non-terminal Ai, all references to lower numbered non-terminal Aj, (where j < i) will be replaced numbered non-terminal Aj, (where j < i) will be replaced by higher numbered non-terminals. by higher numbered non-terminals.

Page 23: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 2323

..

A1 A1 … …

A2 A2 Ai-1 Ai-1 Ai+k Ai+k ……

……

Ai Ai Ai-1 Ai-1 | A2 | A2 … …

……

AnAn

After replacement,there will be no backwardreferences

Page 24: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 2424

Left FactoringLeft Factoring

Consider the following grammarA 1 |

It is not easy to determine whether to expand A to or A transformation called left factoring can be applied. It becomes:

A A’A’

Page 25: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 2525

ExerciseExercise

stmt stmt if expr then stmt if expr then stmt | if expr then stmt else stmt| if expr then stmt else stmt

For the following grammar form:For the following grammar form:

A A 1 | 1 | 22

What is What is ? ? 1? 1? 2?2? : if expr then stmtelse stmt

Page 26: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 2626

Syntax Error HandlingSyntax Error Handling

Different type of errorsDifferent type of errors LexicalLexical SyntacticSyntactic SemanticSemantic LogicalLogical

Error handling goalsError handling goals Report errors clearly and accuratelyReport errors clearly and accurately Recover quickly Recover quickly FastFast

Page 27: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 2727

Error Handling StrategiesError Handling Strategies

Don’t quit after detecting the 1Don’t quit after detecting the 1stst error. error. Avoid introducing “spurious” errorsAvoid introducing “spurious” errors Inhibit error messages that stem from errors Inhibit error messages that stem from errors

uncovered too close togetheruncovered too close together Simple error repair will be sufficient due to the Simple error repair will be sufficient due to the

increasing emphasis on interactive computing increasing emphasis on interactive computing and good programming environment.and good programming environment.

Page 28: Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax

compiler Constreuctioncompiler Constreuction 2828

Error Recovery StrategiesError Recovery Strategies Panic modePanic mode

Deleting input tokens until one of a designated Deleting input tokens until one of a designated set of synchronizing tokens is found. set of synchronizing tokens is found.

Phrase levelPhrase level Local correction to repair punctuation errorsLocal correction to repair punctuation errors

Error productionsError productions Augment the grammar with error productionsAugment the grammar with error productions

Global correctionGlobal correction Globally least-cost correction to a string, costly to Globally least-cost correction to a string, costly to

implement.implement.