Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
ΕΠΛ323-ΘεωρίακαιΠρακτικήΜεταγλωττιστών
Lecture5aSyntaxAnalysisEliasAthanasopoulos
SyntaxAnalysisΣυντακτικήΑνάλυση
• Context-freeGrammars(CFGs)• Derivations• Parsetrees• Top-downParsing• Ambiguities
SyntaxAnalysis
• Syntaxanalysis(parsing)istheprocessofdeterminingifastringoftokenscanbegeneratedbyagrammar
“Igavehimthebook” sentence
subject: I verb:gave object: himindirect object
noun phrase
article: the noun: book
Lexical-SyntaxAnalysis
Sourcecode(characterstream)
Tokenstream
Syntaxtree
if ( b ) a = b ;==
{if (b == 0) a = b;while (a != 1) { printf(“%I “,I--); }
}
{
if_stmtexpr
variable
b
constant
0
block
while_stmt
expr
== !=variable constant
block
1a...
0
LexicalAnalysis
SyntaxAnalysis
expr
variable = variable
a b
TheRoleoftheParser
lexicalanalyzer parser
sourceprogram
token
getnexttoken
symboltable
restoffrontend
parsetree
SyntaxAnalysisOperation
• Input– Astreamoftokenstakenfromlexicalanalysis
• Output– Syntaxtreewhichdeterminesthetokenrelationsandthesyntaxcorrectness(areallparenthesesbalanced?)
• Semanticanalysistakescareoftypes– int x = true;– int y; z = f(y);
SyntaxErrorHandling
• Lexical–Misspellinganidentifier,keyword,oroperator
• Syntactic– Arithmeticexpressionwithunbalancedparenthesis
• Semantic– Operatorappliedtoanincompatibleoperand
• Logical– Infinitelyrecursivecall
ErrorHandlerRequirements
• Itshouldreportthepresenceoferrorsclearlyandaccurately
• Itshouldrecoverfromeacherrorquicklyenoughtobeabletodetectsubsequenterrors
• Itshouldnotsignificantlyslowdowntheprocessingofcorrectprograms
Whathappenswhenanerrorisdetected?• Manystrategies,noneclearlydominates• Notadequatefortheparsertoquitupondetectingthefirsterror– Subsequentparsingmayrevealadditionalerrors
• Usually,thecompilerattemptserrorrecovery– Reasonablehopethattherestoftheprogramcanbeparsed
• Errorrecoveryshouldberealizedcorrectly– Otherwisemanyerrorscanbegenerated
Example
• Whilerecoveringfromanerroracompilermayskipthedeclarationofavariablezap
• Atalaterpointwhenzap isusedthecompilershouldnotgenerateasyntacticerror,butjustthemissingdeclaration– Since,thereshouldbenoentryatthesymboltable
• Conservativestrategy– Onceanerrorisdetected,filteroutcloseerrors(consumeenoughtokenstoexittheerrorarea)
Error-recoveryStrategies• Panicmode
– Onceanerrorisdetected,consumetokensuntilasynchronizingtoken isdetected
– Synchronizingtokensareusuallydelimiters(end, ;),whichhaveaclearmeaning
– Simpleandcannotenteraninfiniteloop• Phraselevel
– Attempttocorrecttheerrorbytakingaction– Insertamissingsemicolon,replaceacommawithasemicolon,
etc.– Cancreateinfiniteloopsifactionsarenotappliedcorrectly– Hardtocopewithcaseswheretheerrorhasoccurredbefore
thepointofdetection
Error-recoveryStrategies• Errorproductions– Commonerrorscanbeaugmentedtothegrammarofthelanguage
– Theparsercanthendetecterrors,sincetheseerrorsarepartofthelanguage
• Globalcorrection– Attempttocorrectanerrorwiththeleastpossibleactions– Givenanincorrectinputstringx andgrammarG,findavalidy,whichcanbederivedfromx withtheleastamountofchanges
– Theclosestcorrectprogrammaynotbetheonetheprogrammerhadinmind
CONTEXT-FREEGRAMMARSΓραμματικέςΧωρίςΣυμφραζόμενα
RegularExpressionsLimitations• RegularexpressionscanbetransformedeasilytoNFA(andthentoDFA)
• Discoveringandclassifyingtokensusingregularexpressionsiseasyandefficient
• Regularexpressionscannotbeusedforsyntaxanalysis
RegularExpressionsLimitations• Matchallbalancedparentheses:– () (()) ()()() (())()((()()))
• YouneedanNFAwithaninfinitenumberofstates
( ( ( ( (
)))))
For5nestedparenthesesyouneedthefollowingNFA
S
Context-freeGrammar(CFG)ΓραμματικήΧωρίςΣυμφραζόμενα
1. Asetoftokens,knownasterminal symbols.– Terminalsarethebasicsymbolsfromwhichstringsareformed.Theword “token”isasynonymfor“terminal”whenwearetalkingaboutprogramminglanguages(e.g.,tokenslikeif,then,andelse areallterminals)
2. Asetofnonterminals.– Nonterminals aresyntacticvariablesthatdenotesetsofstrings.Thenonterminals definesetsofstringsthathelpdefinethelanguagegeneratedbythegrammar.Theyalsoimposeahierarchicalstructureonthelanguagedefinedbythegrammar.
Context-freeGrammar(CFG)ΓραμματικήΧωρίςΣυμφραζόμενα
3. Asetofproductions(κανόνεςπαραγωγής) whereeachproductionconsistsofanonterminal,calledtheleftside oftheproduction,anarrow,andasequenceoftokensand/ornonterminals,calledtherightside oftheproduction.– Theproductionsofthegrammarspecifythemannerinwhichthe
terminalsandnonterminals canbecombinedtoformstrings.Eachproductionconsistsofanonterminal,followedbyanarrow(sometimesthesymbol::== isusedinplaceofthearrow),followedbyastringofnonterminals andterminals.
4. Adesignationofoneofthenonterminals asthestartsymbol– Inagrammar,onenonterminalisdistinguishedasthestartsymbol,
andthesetofstringsitdenotesisthelanguagedefinedbythegrammar.
Example1
• Expressionsofdigitsseparatedbyplusandminussigns– 9-5+2, 3-1, 7
list è list + digit (2.2)list è list – digit (2.3)list è digit (2.4)digit è 0|1|2|3|4|5|6|7|8|9 (2.5)Thethreefirstproductionscanbegrouped:list è list + digit | list – digit | digit
Terminals/Tokens:+ - 0 1 2 3 4 5 6 7 8 9Nonterminals:list, digitSart symbol: list
Example1
• Thetenproductionsforthenonterminaldigitallowittostandforanyofthetokens0, 1, ..., 9
• From2.4asingledigit byitselfisalist• 2.2and2.3expressthefactthatifwetakeanylistandfollowitbyaplusorminussignandthenanotherdigit wehaveanewlist
9-5+2• 9 isalistbyproduction2.4,since9 isadigit• 9-5 isalistbyproduction2.3,since9 isalistand5 isadigit• 9-5+2 isalistbyproduction2.2,since9-5 isalistand2 isadigit
Example2
• “Begin End” blockinPascal
begin... (* Pascal code *)
end
block è begin opt_stmts endopt_stmts è stmt_list | εstmt_list è stmt_list ; stmt | stmt
(stmt isnotexpandedatthispoint)
Example3
• Simplearithmeticexpressionsexpr è expr op exprexpr è (expr)expr è -exprexpr è idop è +op è -op è *op è /op è ^
Equalwith:E è E A E | (E) | -E | idA è + | - | * | / | ^
DerivationΠαραγωγή
E è E A E | (E) | -E | id• TheproductionE è -E signifiesthatanexpressionprecededbyaminussignisalsoanexpression
• WecanthusgeneratemorecomplexexpressionsfromsimplerexpressionsbyjustreplacingE with-E
DerivationΠαραγωγή
E => -E(E derives –E)
ExamplesE è (E) E*E => (E)*E or E*(E)E => -E => -(E) => -(id)
=> Derivesinonestep=> Derivesinzerooremoresteps=> Derivesinoneormoresteps*+
LeftmostderivationE=>-E=>-(E)=>-(E+E)=>-(id+E)=>-(id+id)
RightmostderivationE=>-E=>-(E)=>-(E+E)=>-(E+id)=>-(id+id)
Leftmost- Rightmost
E è E A E | (E) | -E | id (G1)A è + | - | * | / | ^
lm lmlmlmlm
rm rm rm rm rm
Thestring -(id + id) isasentenceofgrammarG1
GrammarsandLanguages
• GivenagrammarGwithastartsymbolS,– Astringofonlyterminals, w, isinL(G)iff S=>w– Thestringw iscalledasentenceof G– L(G) isthelanguagegeneratedbyG andincludesallw(stringscomposedbyterminalsofG)
• Alanguagethatcanbegeneratedbyagrammarisacontext-freegrammar
• Iftwogrammarsgeneratethesamelanguage,thentheyareequivalent
+
ParseTrees
Aparsetreemaybeviewedasagraphicalrepresentationforaderivationthatfiltersoutthechoiceregardingreplacementorder.
E
E
E
E E
-
( )
+
id id
E=>-E=>-(E)=>-(E+E)=>-(id+E)=>-(id+id)lm lmlmlmlm
ConstructingtheParseTreeE E
E-E
E
E
-
( )
E
E
E
E E
-
( )
+
id id
E
E
E
E E
-
( )
+
id
E
E
E
E E
-
( )
+
=> =>
=> =>=>
AmbiguityΑμφισημία
• Agrammarthatproducesmorethanoneparsetreeforsomesentenceissaidtobeambiguous
• Forcertaintypesofparsers,itisdesirablethatthegrammarbemadeunambiguous
• Forsomeapplicationsweshallalsoconsidermethodswherebywecanusecertainambiguousgrammars,togetherwithdisambiguating rulesthat“throwaway”undesirableparsetrees