40
Compiler Design Spring 2017 3.0 Frontend Dr. Zoltán Majó Compiler Group – Java HotSpot Virtual Machine Oracle Corporation 1

Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

  • Upload
    doananh

  • View
    223

  • Download
    7

Embed Size (px)

Citation preview

Page 1: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

CompilerDesignSpring2017

3.0Frontend

Dr.Zoltán Majó

CompilerGroup– JavaHotSpot VirtualMachineOracleCorporation

1

Page 2: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Differentparsetrees

§ Therearegrammarsthatallowmorethanoneright-mostderivationforw∈ L(G)§ (Ormorethanoneleft-mostderivation)

§ Example(right-most)§ Derivation#1:Sà EàEOpEà EOpEOpEà EOpEOpIdà EOp

E*Idà EOpId*Idà EOpId*Idà E+Id*Idà Id+Id*Id§ Derivation#2:Sà EàEOpEà EOpIdà E*Idà EOpE*Idà E

OpId*Idà E+Id*Idà Id+Id*Id

2

S à E (1)E à EOpE (2)

|- E (3)|(E) (4)|Id (5)

Op à +|- |*|/ (6)Id:L{L|N}*

Page 3: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Derivationsandparsetrees

Tree#1 Tree#2

3

S

E

OpE E

Id*Op EE

Id + Id

S

E

Op EE

Id + Op EE

Id * Id

Page 4: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

3.1.3Ambiguity

§ Agrammarthatallowsmorethanonparsetreeforatleastw∈ L(G)iscalledambiguous

§ Note:Ambiguityispropertyofthegrammar§ Wegivelateranon-ambiguousgrammarforexpressions

§ Weneedtocompareparsetrees(andderivations)§ Comparingderivationseasyifonlyleft-most(right-most)used

§ Alternativedefinition:Agrammarthatallowsmorethanone(right|left)-mostderivationforatleastonew∈ L(G)iscalledambiguous

4

Page 5: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Problemsw/ambiguity

§ Compilerdoesnotknowhowtointerpret“a+b*c”§ IsitTree#1?I.e.,(a + b) * c§ OrisitTree#2?I.e.,a + (b * c)

§ Whatcanwedo?

5

Page 6: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Addressingambiguity

§ Changethegrammar§ Seelaterforbettergrammar§ Maynotalwaysbepossible

§ Changelanguage

§ Addrulesthat“*”bindsmorestronglythan“+”§ Precedence§ Resolvesconflicts

§ Badidea:Letthecompiler(writer)decide§ Orlettheuserworry

6

Page 7: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Anotherexample

§ “If”statement

§ Twoforms§ if(Condition)then(Body)§ if(Condition)then(Body)else(Body)

7

Page 8: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Anotherexample

§ Startsymbol:S

§ ProductionsS à stmt-listS|stmt-liststmt-list à ….|if-stmtif-stmt à if cond-exprthen S|

if cond-exprthen Selse Scond-exprà …

§ Otherstatements(assign,functioncall,…)omitted

8

Page 9: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

9

Page 10: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

10

Page 11: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Ambiguouslanguages

§ Ambiguityisapropertyofthegrammar§ Onewordisenoughtoshowambiguity§ Howdoyoushowthatagrammarisnot ambiguous?

§ Proof(foronegrammar)§ Somekindsofgrammarsarecertifiedunambiguous

§ Wewilllookatthoseincompilerdesign

§ Unfortunatelytherearelanguagesthatareinherentlyambiguous§ Allgrammarsthatgeneratesuchalanguageareambiguous§ EvenforType-2(contextfree)grammars

11

Page 12: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

TransitionfromparsetreetoIR

§ Parsetree§ Sometimescalledconcretesyntaxtree§ Interiornodesrepresentnon-terminals

§ Ourtree-basedIR:Abstract-syntaxtree§ Interiornodesrepresentprogrammingconstructs§ Non-terminalsnot(directly)preserved§ Structureclosetothatoftheparsetree

§ BuildingIR:Viaderivationsorseparatetransformationstep

12(Slidecorrectedafterlecture– zmajo)

Page 13: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

ParsetreevsIR

Concretesyntaxtree Abstractsyntaxtree(IR)

13

S

E

Op EE

Ida7 + Id

b

+

VARb

VARa7

Page 14: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Summary

§ Frontendperformstwotasks§ Breakinputintotokens§ Analyzethatsequenceoftokensislegalinput

§ FindderivationS⇒*w

§ Goal:produceIR

§ Parsetreescapturederivations

§ OurIRistree-based,sostepfromparsetreetoIRtreenotthatlarge

14

Page 15: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

CompilerDesignSpring2017

3.2Lexicalanalysis3.3Top-downparsing

Dr.Zoltán Majó

CompilerGroup– JavaHotSpot VirtualMachineOracleCorporation

15

Page 16: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Overview

§ 3.1Introduction

§ 3.2Lexicalanalysis

§ 3.3“Topdown”parsing

§ 3.4“Bottomup”parsing

16

Page 17: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Outline

§ Usingmultiplegrammarstosaveworkandsimplify§ Useincompilerfront-end

§ Top-downparsing§ Simplebacktrackingparsers§ Simplepredictiveparsers

17

Page 18: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

3.2Lexicalanalysis

§ Useregularexpressiontodescribeelementsoflanguage§ Namesofvariables,fields,methods,classes,…§ Constants(int,float,double,hex,…)§ Keywordsofthelanguage(ifthenelsewhileclass…)§ Example(frompreviouslecture):

§ Id:L{L|N}*§ L={a|b|c|… |z}§ N={0|1|2|… |9}

§ Regularexpressionsà DFA§ Automaticconstructioneasy§ DFAproducesthetokens

18

Page 19: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Howitworks

19

Sourceprogram

Yes

No

a 3+bb+3a

DFA

Lexer (orscanner)

Tokens:Id(b)Term(+)Id(a3)

Analyzer

Parser

Page 20: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

21

Page 21: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Tokenassembly

§ First(partof)answer:Stopwhenencounteringacharacterthatdoesnot belongtocurrenttoken

§ Formanylanguages:Stopwhenencounteringwhitespace

§ Whitespace:Invisibleand/orirrelevantforprogram§ LookatC,C++,Java§ <space>␣§ Newline,formfeed,CR(carriagereturn)§ Tab§ Comments

23

Page 22: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Commentsandwhitespace

§ Somelanguagesattachmeaningtowhitespace§ NestinglevelinPython§ “make”utility§ Warning:macrofacilities,pragma

§ Notallcommentsarewhitespace§ Directiveshiddenincomments§ Example:Fortran90commentstartwith“!”

!DEC$ IVDEP – ignorevectordependenciesDO I=1, N

A(INDARR(I)) = A(INDARR(I)) + B(I)

END DO

24

Page 23: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

IVDEP– what’sthat?

§ “ignorevectordependencies”A(INDARR(I)) = A(INDARR(I)) + B(I)

§ Parallelization§ Processor0:A[INDARR[1]] = …§ Processor1:A[INDARR[2]] = …

§ Possibleoutcomes§ INDARR[1] ==

§ INDARR[2] ==

25

10

100

192

192

Page 24: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

27

Page 25: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Maximalmunchlimitations

§ Doesnotworkforallprogramminglanguages

§ ExampleCprogramsegmentint j, k;

int* kaddr;

int** kkaddr;

kaddr = & k;

j = *kaddr + 2;

kkaddr = & kaddr;

j = **kkaddr + 3;

k = 5 * * * kkaddr;

j = 7 * * kaddr;

28

Token:”**”Token:”*”

Page 26: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Whatcanbedone?

§ Close(r)couplingbetweenlexer andparser

29

Lexer

Parser

Token

Inputprogram

RequestsId,“*”(typeoftokenexpected

oralistoftypesexpected)

....

Page 27: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Overview

§ 3.1Introduction

§ 3.2Lexicalanalysis

§ 3.3“Topdown”parsing

§ 3.4“Bottomup”parsing

31

Page 28: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Isw∈ L(G)?

§ Recall:givenGandw,wanttoknowifw∈ L(G)

§ Approach:Findderivation§ S⇒ a⇒ …⇒ w

§ Twoprincipalapproaches§ StartwithS(Startsymbol),worktowardsw

§ Guesswhatproductionwillleadtow§ “Top-down”parsing

S⇒ ……⇒ w§ StartwithwandtrytofindawaytogetbacktoS

§ Guesshowwwasgenerated§ “Bottom-up”parsing

w⇐ …⇐ …⇐ S

32

Yes

Page 29: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

3.3“Topdown”parsing

§ Givenw∈ T*andcontext-freegrammarG(S,T,NT,P)isw∈ L(G)?

§ Top-down:findaderivationS⇒ …⇒ w§ Wanttofindaleft-mostderivation§ Processinputfromleft-to-right

§ Languagesdescribedbyacontext-freegrammarcanberecognized byastackmachine§ wrecognized⇔ w∈ L(G)§ Getderivationforfree(sequenceofactionsbystackmachine)

33

Page 30: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Simplestackmachine

34

Parsercontrol

TOS

$

a + b $

ip

sp

inputstring($ istheendofinputmarker)

Page 31: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Actions

§ Error§ w∉ G

§ Accept§ w∈ G

§ Match§ Consume:Removefrominput,advanceinputpointer§ Popstack

§ Reduction§ Useproductiontoexpand/contractthetopofthestack

35

Page 32: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Parserdecisions

§ Parsermustdecidebasedontopofstackandcurrentinput

§ Currentinput§ Eitherthenexttoken§ Orsomenumberkofremainingtokens

36

Page 33: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Trivialgrammar

§ StartsymbolS

§ Terminals:{Id,+,-,*,/}

§ Non-terminals:{S,Op}

§ ProductionsS à IdOpId| (1)

- Id (2)Op à +| (3)

- | (4)*| (5)/ (6)

37

Page 34: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

38

Page 35: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Parserdecisions

§ Parsermustdecidebasedontopofstackandcurrentinput

§ Currentinput§ Eitherthenexttoken§ Orsomenumberkofremainingtokens

§ Howcanwecontroltheparser?§ Mustbesurethatw∉ L(G)ifwesaythereisnoderivation

43

Page 36: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Grammars&words

§ Wordsarefinite

§ Grammarsarefinite§ Finitealphabets§ Finitenumberofproductions§ Tryuntilyousucceed

44

Page 37: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

3.3.1Backtrackingparsers

§ GivengrammarG,wordw§ Basicidea

§ StartwithS§ Givenstateofstack,restofinput§ Canwematch,consume&popasymbol

§ Yes:Doit§ No:Canweapplyaproductiontonon-terminalXontopofstack?

§ Yes:Doit.§ No:stuck,continuewithundo

§ Undo:undolaststepandtryanotherproduction§ EitherforX,or(iftherearenochoicesleft)§ Fornon-terminalthatwasreplacedinpreviousstep

§ Mayhavetorestoreinput45

Page 38: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

49

Page 39: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

Backtracking

§ Accept ifstackisemptyandallinputconsumed

§ Reject iftherearenomorechoicestotry

§ Implementationeasy

§ Maynotbeefficient– butfastenoughinsomesettings§ Canbeusedforany language

50

Page 40: Compiler Design - ETH Zpeople.inf.ethz.ch/zmajo/teaching/cd_ss17/slides/w03_02-lexical... · §Compiler does not know how to interpret “a + b * c ... § We will look at those in

51