Upload
vikasdalal
View
243
Download
0
Tags:
Embed Size (px)
DESCRIPTION
eww
Citation preview
Introduction to CompilersIntroductiontoCompilers
Writing Cross CompilersWritingCrossCompilers
Mac C compiler Unix C Mac C complierMac C compilersource code
in Unix C
Unix Ccompiler
Mac C complierusable on Unix
Mac C complierusable on Unix
Mac C compilersource code
in Unix C
Mac C complierusable on Mac
in Unix C
Writing Retargetable CompilersWritingRetargetableCompilers
Twomethods: Make a strict distinction between frontendMakeastrictdistinctionbetweenfront endandbackend,thenusedifferentbackends.
Generatecodeforavirtualmachine,thenbuild,acompilerorinterpretertotranslatevirtualmachinecodetoaspecificmachinecode.
BootstrappingBootstrapping Processofwritinga compiler (or assembler)ing p ( )thetarget programminglanguage whichitisintendedtocompile.
Applying this technique leads to a self Applyingthistechniqueleadstoaselfhosting compiler.
Many compilers for many programmingManycompilersformanyprogramminglanguagesarebootstrapped,includingcompilersfor BASIC, ALGOL, C, Pascal, PL/I, Factor, Haskell,Modula 2 Oberon OCaml CommonModula2, Oberon, OCaml, CommonLisp, Scheme,Java, Python, Scala, Nimrod, Eiffel,andmore.
Formal LanguagesFormalLanguages
Already studiedAlreadystudied
Roles of ScannerRolesofScanner
Removal of commentsRemovalofcomments Caseconversion Removal of white spaces Removalofwhitespaces
Blanks,tabulars,carriagereturnsandlinefeeds Interpretation of compiler directives Interpretationofcompilerdirectives
#include, #ifdef, #ifndef and#define are directives to redirect the input of#define aredirectivesto redirecttheinput ofthecompiler
Maybedonebyaprecompiler
Token: An element of the lexical definition ofToken:Anelementofthelexicaldefinitionofthelanguage.
Lexeme: A sequence of characters identified Lexeme:Asequenceofcharactersidentifiedasatoken.P S f i i d ib d b l Pattern :Setofstringsisdescribedbyarulecalledpatternassociatedwithatoken.
Regular Languages and Regular ExpressionRegularLanguagesandRegularExpression
Studied in Theory of computationStudiedinTheoryofcomputation
Possible ImplementationsPossibleImplementations
LexicalAnalyzerGenerator(e.g.Lex)y ( g )+ safe,quick Mustlearnsoftware,unabletohandleunusualsituations
TableDrivenLexicalAnalyzer+ generalandadaptablemethod,samefunctioncanbeusedfor all tabledriven lexical analyzersforalltable drivenlexicalanalyzers
Buildingtransitiontablecanbetediousanderrorprone
Possible ImplementationsPossibleImplementations
HandwrittenHand written+ Canbeoptimized,canhandleanyunusualsituation easy to build for most languagessituation,easytobuildformostlanguages
Errorprone,notadaptableormaintainable
Design of a Lexical AnalyzerDesignofaLexicalAnalyzer
St Steps1- Construct a set of regular expressions (REs)
that define the form of all valid tokenf h2- Derive an NDFA from the REs
3- Derive a DFA from the NDFA4- Translate to a state transition table5- Implement the table5 Implement the table6- Implement the algorithm to interpret the table
SpecificationoftokensSpecification of tokensRegularexpressionsareimportantnotationforspecifying patternsspecifyingpatterns.
RulestodefineRegularexpressions
Limitations of regular expressionsLimitationsofregularexpressions
Notdescribebalancedornestedconstructs.RepeatingstringscannotbedescribedEg{wcw|wisstringofasandbs}
Regular ExpressionsRegularExpressions
{ } : { }s : {s | s in s^}a : {a}a : {a}r | s : {r | r in r^} or {s | s in s^}s* : {sn | s in s^ and n>=0}s+ : {sn | s in s^ and n> 1}
id -> letter(letter|digit)*
s+ : {sn | s in s and n>=1}
Num->digit+(.digit+)? (E(+|-)?digit+)?
Recognition of tokensRecognitionoftokensTransitiondiagrams:
Asanintermediatestepinconstructionoflexicalanalyzer,weproduceastylizedflowchart,calledatransitiondiagram.
start letter
Letterordigit
other ( k () ll d())start
9 10 11other Return(gettoken(),install_id())
Transitiondiagramforidentifiersandkeywords
Implementingatransitiondiagramp g gAsequenceoftransitiondiagramscanbeconvertedintoaprogramtolookforthetokensspecifiedbythediagrams.Programsizeisproportionaltothenoof
& d i h distates&edgesinthediagrams.
digit
25 26 27
start digit
g
other
Transitiondiagramfornumbers
C code for Lexical Analyzer is :CcodeforLexicalAnalyzeris:
token nexttoken()token nexttoken() {while(1){
switch (state) { case 0: c = nextchar(); /* c is lookahead character */ if ( bl k t b li ) {if (c==blank :: c==tab :: c==newline) { state = 0; lexerne beginning++; _ g g/* advance beginning of lexerne */ }
else if (c == '') state = 6;else if (c == > ) state = 6;
else state = fail(); ()break; /* cases 1-8 here */ case9:c=nextchar ();
if (isletter(c)) state = 10; else state = fail();else state = fail(); break;
case 10: c = nextchar(); if (isletter(c)) state = 10; else if (isdigit(c)) state = 10; else state = 11;break;
case 11: retract(1); install id();case 11: retract(1); install_id(); return ( gettoken() ); .../* cases 12-24 here */ case25:c=nextchar ();
if(isdigi t(c))state=26;
else state = fail(); break;
case 26: c = nextchar();case 26: c = nextchar(); if (isdigit(c)) state = 26;else state = 27; break;
case 27: retract(1); install_nurn(); return ( NUM ); }}}
Gettoken()Looksforlexemeinsymboltable.Iflexemeiskeyword,correspondingtokenisreturned;otherwisetokenidisreturned.
Install id()Install_id()Hasaccesstobuffer,wheretheidentifierlexemeislocated.
Sym table is examined & if lexeme is found marked as keyword,it returns 0.Symtableisexamined&iflexemeisfoundmarkedaskeyword,itreturns0.
Lexemeisfound&isprogramvariable,returnspointertosymtableentry
Ifnotfoundinsymtable,itisinstalledasavariable&pointertonewlycreatedt i t dentryisreturned.
Install_num()
Derive NDFA from REsDeriveNDFAfromREs
CouldderiveDFAfromREsbut: MucheasiertodoNDFA,thenderiveDFA No standard way of deriving DFAs from ResNostandardwayofderivingDFAsfromRes UseThompsonsconstruction(Loudens)
letter
letter
digit
letter
Derive DFA from NDFADeriveDFAfromNDFA Use subset construction (Loudens)Usesubsetconstruction(Louden s) Maybeoptimized
i i l Easiertoimplement: No edges Determinist(nobacktracking)
l
letter
[ h ]
letter
letter [other]letter
l
e
t
t
e
r
digit
digitdigit
Implementation ConcernsImplementationConcerns
BacktrackingBacktracking Principle :Atokenisnormallyrecognizedonlywhenthenextcharacterisread.
Problem :Maybethischaracterispartofthenexttoken. Example :x
Implementation ConcernsImplementationConcerns
AmbiguityAmbiguity Problem :Sometokenslexemesaresubsetsofothertokens.
Example : n-1. Isitor?l i Solutions :
Postponethedecisiontothesyntacticanalyzer Donotallowsignprefixtonumbersinthelexicalspecificationg p p Interactwiththesyntacticanalyzertofindasolution.(Inducescoupling)
ExampleExample
Alphabet:p {:,*,=,(,),,{,},[a..z],[0..9]}
Simpletokens: {(,),{,},:,}
Compositetokens:{ (* *)} {:=,>=,
ExampleExample
Ambiguity problems: Ambiguityproblems:Character Possible tokens
: :, :=: :, :> >, >=<