Upload
doananh
View
223
Download
7
Embed Size (px)
Citation preview
CompilerDesignSpring2017
3.0Frontend
Dr.Zoltán Majó
CompilerGroup– JavaHotSpot VirtualMachineOracleCorporation
1
Differentparsetrees
§ Therearegrammarsthatallowmorethanoneright-mostderivationforw∈ L(G)§ (Ormorethanoneleft-mostderivation)
§ Example(right-most)§ Derivation#1:Sà EàEOpEà EOpEOpEà EOpEOpIdà EOp
E*Idà EOpId*Idà EOpId*Idà E+Id*Idà Id+Id*Id§ Derivation#2:Sà EàEOpEà EOpIdà E*Idà EOpE*Idà E
OpId*Idà E+Id*Idà Id+Id*Id
2
S à E (1)E à EOpE (2)
|- E (3)|(E) (4)|Id (5)
Op à +|- |*|/ (6)Id:L{L|N}*
Derivationsandparsetrees
Tree#1 Tree#2
3
S
E
OpE E
Id*Op EE
Id + Id
S
E
Op EE
Id + Op EE
Id * Id
3.1.3Ambiguity
§ Agrammarthatallowsmorethanonparsetreeforatleastw∈ L(G)iscalledambiguous
§ Note:Ambiguityispropertyofthegrammar§ Wegivelateranon-ambiguousgrammarforexpressions
§ Weneedtocompareparsetrees(andderivations)§ Comparingderivationseasyifonlyleft-most(right-most)used
§ Alternativedefinition:Agrammarthatallowsmorethanone(right|left)-mostderivationforatleastonew∈ L(G)iscalledambiguous
4
Problemsw/ambiguity
§ Compilerdoesnotknowhowtointerpret“a+b*c”§ IsitTree#1?I.e.,(a + b) * c§ OrisitTree#2?I.e.,a + (b * c)
§ Whatcanwedo?
5
Addressingambiguity
§ Changethegrammar§ Seelaterforbettergrammar§ Maynotalwaysbepossible
§ Changelanguage
§ Addrulesthat“*”bindsmorestronglythan“+”§ Precedence§ Resolvesconflicts
§ Badidea:Letthecompiler(writer)decide§ Orlettheuserworry
6
Anotherexample
§ “If”statement
§ Twoforms§ if(Condition)then(Body)§ if(Condition)then(Body)else(Body)
7
Anotherexample
§ Startsymbol:S
§ ProductionsS à stmt-listS|stmt-liststmt-list à ….|if-stmtif-stmt à if cond-exprthen S|
if cond-exprthen Selse Scond-exprà …
§ Otherstatements(assign,functioncall,…)omitted
8
9
10
Ambiguouslanguages
§ Ambiguityisapropertyofthegrammar§ Onewordisenoughtoshowambiguity§ Howdoyoushowthatagrammarisnot ambiguous?
§ Proof(foronegrammar)§ Somekindsofgrammarsarecertifiedunambiguous
§ Wewilllookatthoseincompilerdesign
§ Unfortunatelytherearelanguagesthatareinherentlyambiguous§ Allgrammarsthatgeneratesuchalanguageareambiguous§ EvenforType-2(contextfree)grammars
11
TransitionfromparsetreetoIR
§ Parsetree§ Sometimescalledconcretesyntaxtree§ Interiornodesrepresentnon-terminals
§ Ourtree-basedIR:Abstract-syntaxtree§ Interiornodesrepresentprogrammingconstructs§ Non-terminalsnot(directly)preserved§ Structureclosetothatoftheparsetree
§ BuildingIR:Viaderivationsorseparatetransformationstep
12(Slidecorrectedafterlecture– zmajo)
ParsetreevsIR
Concretesyntaxtree Abstractsyntaxtree(IR)
13
S
E
Op EE
Ida7 + Id
b
+
VARb
VARa7
Summary
§ Frontendperformstwotasks§ Breakinputintotokens§ Analyzethatsequenceoftokensislegalinput
§ FindderivationS⇒*w
§ Goal:produceIR
§ Parsetreescapturederivations
§ OurIRistree-based,sostepfromparsetreetoIRtreenotthatlarge
14
CompilerDesignSpring2017
3.2Lexicalanalysis3.3Top-downparsing
Dr.Zoltán Majó
CompilerGroup– JavaHotSpot VirtualMachineOracleCorporation
15
Overview
§ 3.1Introduction
§ 3.2Lexicalanalysis
§ 3.3“Topdown”parsing
§ 3.4“Bottomup”parsing
16
Outline
§ Usingmultiplegrammarstosaveworkandsimplify§ Useincompilerfront-end
§ Top-downparsing§ Simplebacktrackingparsers§ Simplepredictiveparsers
17
3.2Lexicalanalysis
§ Useregularexpressiontodescribeelementsoflanguage§ Namesofvariables,fields,methods,classes,…§ Constants(int,float,double,hex,…)§ Keywordsofthelanguage(ifthenelsewhileclass…)§ Example(frompreviouslecture):
§ Id:L{L|N}*§ L={a|b|c|… |z}§ N={0|1|2|… |9}
§ Regularexpressionsà DFA§ Automaticconstructioneasy§ DFAproducesthetokens
18
Howitworks
19
Sourceprogram
Yes
No
a 3+bb+3a
DFA
Lexer (orscanner)
Tokens:Id(b)Term(+)Id(a3)
Analyzer
Parser
21
Tokenassembly
§ First(partof)answer:Stopwhenencounteringacharacterthatdoesnot belongtocurrenttoken
§ Formanylanguages:Stopwhenencounteringwhitespace
§ Whitespace:Invisibleand/orirrelevantforprogram§ LookatC,C++,Java§ <space>␣§ Newline,formfeed,CR(carriagereturn)§ Tab§ Comments
23
Commentsandwhitespace
§ Somelanguagesattachmeaningtowhitespace§ NestinglevelinPython§ “make”utility§ Warning:macrofacilities,pragma
§ Notallcommentsarewhitespace§ Directiveshiddenincomments§ Example:Fortran90commentstartwith“!”
!DEC$ IVDEP – ignorevectordependenciesDO I=1, N
A(INDARR(I)) = A(INDARR(I)) + B(I)
END DO
24
IVDEP– what’sthat?
§ “ignorevectordependencies”A(INDARR(I)) = A(INDARR(I)) + B(I)
§ Parallelization§ Processor0:A[INDARR[1]] = …§ Processor1:A[INDARR[2]] = …
§ Possibleoutcomes§ INDARR[1] ==
§ INDARR[2] ==
25
10
100
192
192
27
Maximalmunchlimitations
§ Doesnotworkforallprogramminglanguages
§ ExampleCprogramsegmentint j, k;
int* kaddr;
int** kkaddr;
kaddr = & k;
j = *kaddr + 2;
kkaddr = & kaddr;
j = **kkaddr + 3;
k = 5 * * * kkaddr;
j = 7 * * kaddr;
28
Token:”**”Token:”*”
Whatcanbedone?
§ Close(r)couplingbetweenlexer andparser
29
Lexer
Parser
Token
Inputprogram
RequestsId,“*”(typeoftokenexpected
oralistoftypesexpected)
....
Overview
§ 3.1Introduction
§ 3.2Lexicalanalysis
§ 3.3“Topdown”parsing
§ 3.4“Bottomup”parsing
31
Isw∈ L(G)?
§ Recall:givenGandw,wanttoknowifw∈ L(G)
§ Approach:Findderivation§ S⇒ a⇒ …⇒ w
§ Twoprincipalapproaches§ StartwithS(Startsymbol),worktowardsw
§ Guesswhatproductionwillleadtow§ “Top-down”parsing
S⇒ ……⇒ w§ StartwithwandtrytofindawaytogetbacktoS
§ Guesshowwwasgenerated§ “Bottom-up”parsing
w⇐ …⇐ …⇐ S
32
Yes
3.3“Topdown”parsing
§ Givenw∈ T*andcontext-freegrammarG(S,T,NT,P)isw∈ L(G)?
§ Top-down:findaderivationS⇒ …⇒ w§ Wanttofindaleft-mostderivation§ Processinputfromleft-to-right
§ Languagesdescribedbyacontext-freegrammarcanberecognized byastackmachine§ wrecognized⇔ w∈ L(G)§ Getderivationforfree(sequenceofactionsbystackmachine)
33
Simplestackmachine
34
Parsercontrol
TOS
$
a + b $
ip
sp
inputstring($ istheendofinputmarker)
Actions
§ Error§ w∉ G
§ Accept§ w∈ G
§ Match§ Consume:Removefrominput,advanceinputpointer§ Popstack
§ Reduction§ Useproductiontoexpand/contractthetopofthestack
35
Parserdecisions
§ Parsermustdecidebasedontopofstackandcurrentinput
§ Currentinput§ Eitherthenexttoken§ Orsomenumberkofremainingtokens
36
Trivialgrammar
§ StartsymbolS
§ Terminals:{Id,+,-,*,/}
§ Non-terminals:{S,Op}
§ ProductionsS à IdOpId| (1)
- Id (2)Op à +| (3)
- | (4)*| (5)/ (6)
37
38
Parserdecisions
§ Parsermustdecidebasedontopofstackandcurrentinput
§ Currentinput§ Eitherthenexttoken§ Orsomenumberkofremainingtokens
§ Howcanwecontroltheparser?§ Mustbesurethatw∉ L(G)ifwesaythereisnoderivation
43
Grammars&words
§ Wordsarefinite
§ Grammarsarefinite§ Finitealphabets§ Finitenumberofproductions§ Tryuntilyousucceed
44
3.3.1Backtrackingparsers
§ GivengrammarG,wordw§ Basicidea
§ StartwithS§ Givenstateofstack,restofinput§ Canwematch,consume&popasymbol
§ Yes:Doit§ No:Canweapplyaproductiontonon-terminalXontopofstack?
§ Yes:Doit.§ No:stuck,continuewithundo
§ Undo:undolaststepandtryanotherproduction§ EitherforX,or(iftherearenochoicesleft)§ Fornon-terminalthatwasreplacedinpreviousstep
§ Mayhavetorestoreinput45
49
Backtracking
§ Accept ifstackisemptyandallinputconsumed
§ Reject iftherearenomorechoicestotry
§ Implementationeasy
§ Maynotbeefficient– butfastenoughinsomesettings§ Canbeusedforany language
50
51