2_SimpleOnePassCompiler

Embed Size (px)

Citation preview

  • 8/7/2019 2_SimpleOnePassCompiler

    1/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 1/66

    2. A SIMPLE ONE-PASS COMPILERInfix expression -> Postfix expression

    Yoon-Joong KimComputer Engineering Department, Hanbat National University

  • 8/7/2019 2_SimpleOnePassCompiler

    2/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 2/66

    2.1 Overview :

    Language: Appearance(Vocabulay,Syntax), Semantics

    2.2 Syntax definition

    Alphabet, string(sentence), language, BNF, Context Free Grammar, Parse Tree, Ambiguity,

    Associativity and precedence of operator, Syntax of assignment statement.

    2.3 Syntax-directed translation :

    SDD, SDTS

    2.4 Parsing :Top-Down Parsing, Predictive Parsing(Recursive Decent Parsing), Left factoring, Elimination of left-recursion

    2.5 A translator for simple expression : Design and implementation

    2.6 Lexical analysis

    Design and implementation of as simple lexical analyzer

    2.7 Incorporating a symbol table

    2.8 A abstract stack machine

    Stack machine operation, Lvalue, Rvalue, Stack manipulation, Translation of expression,

    Translation of statements. Emitting a translation

  • 8/7/2019 2_SimpleOnePassCompiler

    3/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 3/66

    2.9 Putting the techniques together

    Design and implementation of a simple one pass translator, infix expression to postfix expression

  • 8/7/2019 2_SimpleOnePassCompiler

    4/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 4/66

    This chapter

    is an introduction to the material in Chapter 3 through 8. concentrates on the front end of compiler, that is, on lexical analysis, parsing, and

    intermediate code generation.

    2.1 OVERVIEWLanguage Definition

    - Appearance of programming language :

    Vocabulary : Regular expressionSyntax : Backus-Naur Form(BNF) or Context Free Form(CFG)

    - Semantics : Informal language or some examples

    l Fig 2.1. Structure of our compiler front end

  • 8/7/2019 2_SimpleOnePassCompiler

    5/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 5/66

    2.2 SYNTAX DEFINITION

    lTo specify the syntax of a language : CFG and BNF Example : if-else statement in C has the form of

    statement if ( expression ) statement else statementAn alphabet of a language is a set of symbols.

    Examples : {0,1} for a binary number system(language)={0,1,100,101,...}

    {a,b,c} for language={a,b,c, ac,abcc..}

    {if ,(,),else ...} for a if statements={if(a==1)goto10, if--}A string over an alphabet

    is a sequence of zero or more symbols from the alphabet.

    Examples : 0,1,10,00,11,111,0202 ... strings for a alphabet {0,1}

    Null string is a string which does not have any symbol of alphabet.

    Language

    Is a subset of all the strings over a given alphabet. Alphabets Ai Languages Li for Ai

    A0={0,1} L0={0,1,100,101,...}

  • 8/7/2019 2_SimpleOnePassCompiler

    6/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 6/66

    A1={a,b ,c} L1={a,b,c, ac, abcc..}

    A2={all of C tokens} L2= {all sentences of C program }

    Example 2.1. Grammar for expressions consisting of digits and plus and minus

    signs.

    Language of expressions L={9-5+2, 3-1, ...}

    The productions of grammar for this language L are:

    list list + digit

    list list - digit

    list digit

    digit 0|1|2|3|4|5|6|7|8|9

    list, digit : Grammar variables, Grammar symbols

    {0,1,2,3,4,5,6,7,8,9,-,+} : Alphabet for language L, Tokens, Terminal symbolsConvention specifying grammar

    Terminal symbols : bold face string i f , num, id Nonterminal symbol, grammar symbol : italicized names, list, digit ,A,B

  • 8/7/2019 2_SimpleOnePassCompiler

    7/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 7/66

    Grammar G=(N,T,P,S)

    N : a set of nonterminal symbols

    T : a set of terminal symbols, tokens

    P : a set of production rules

    S : a start symbol, SN

    Grammar G for a language L={9-5+2, 3-1, ...}

    G=(N,T,P,S)

    N={list,digit}T={0,1,2,3,4,5,6,7,8,9,-,+} : is a kind of alphabet for L

    P : list -> list + digit list -> list - digit list -> digit

    digit -> 0 |1 |2 |3 |4 |5 |6 |7 |8 |9S= list

  • 8/7/2019 2_SimpleOnePassCompiler

    8/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 8/66

    Some definitions for a language L and its grammar G

    Derivation :A sequence of replacements S 1 2n is a derivation of n.

    Example, A derivation 1+9 from the grammar G

    left most derivation list list + digit digit + digit 1 + digit 1 + 9

    right most derivation list list + digit list + 9 digit + 9 1 + 9

    Language of grammar L(G)L(G) is a set of sentences(strings) that can be generated from the grammar G=(N,T,P,S).

    L(G)={x| S *

    x} where x a sequence of terminal symbols

    Example: Consider a grammar G=(N,T,P,S):

    N={S} T={a,b}

    S=S P ={S aSb | }

    Is aabb a sentence of L(g)? (derivation of string aabb)SaSbaaSbbaabbaabb(or S* aabb) so, aabbL(G)

    there is no derivation for aa, so aaL(G) note L(G)={anbn| n0} where anbn meas n as followed by n bs.

  • 8/7/2019 2_SimpleOnePassCompiler

    9/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 9/66Parse TreeA derivation can be conveniently represented by a derivation tree( parse tree).

    The root is labeled by the start symbol.

    Each leaf is labeled by a token or .

    Each interior nodes are labeled by a nonterminal symbol.

    When a production Ax 1 xn is derived, nodes labeled by x 1 xn are made as children

    nodes of node labeled by A.

    root : the start symbol internal nodes : nonterminal leaf nodes : terminal

  • 8/7/2019 2_SimpleOnePassCompiler

    10/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 10/66

    Example G:

    list -> list + digit | list - digit | digit

    digit -> 0|1|2|3|4|5|6|7|8|9

    left most derivation for 9-5+2,list list+digit list-digit+digit digit-digit+digit 9-digit+digit

    9-5+digit 9-5+2 right most derivation for 9-5+2,

    list list+digit list+2 list-digit+2 list-5+2 digit-5+2 9-5+2

    parse tree for 9-5+2

  • 8/7/2019 2_SimpleOnePassCompiler

    11/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 11/66AmbiguityA grammar is said to be ambiguous if the grammar has more than one parsetree for a given string of tokens.

    Example 2.5. Suppose a grammar G that can not distinguish between lists anddigits as in Example 2.1.

    G : string string + string | string - string |0|1|2|3|4|5|6|7|8|9

    1-5+2 has 2 parse trees => Grammar G is ambiguous.

  • 8/7/2019 2_SimpleOnePassCompiler

    12/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 12/66Associativity of operatorA operator is said to be left associative if an operand with operators on both sides of it is taken bythe operator to its left.

    eg) 9+5+2(9+5)+2, a=b=ca=(b=c)

    Left Associative Grammar :

    list list + digit | list - digit

    digit 0|1||9

    Right Associative Grammar :

    right letter = right | letter letter a|b||z

    left associative operators :+ , - , * , /

    right associative operators := , **

  • 8/7/2019 2_SimpleOnePassCompiler

    13/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 13/66Precedence of operatorsWe say that a operator(*) has higher precedence than other operator(+) if the operator(*) takesoperands before other operator(+) does.

    ex. 9+5*29+(5*2), 9*5+2(9*5)+2

    Syntax of full expressions

    operator associative precedence+ , - left 1 low

    * , / left 2 heigh

    left a. left a. precedence/depth

    expr expr + term | expr - term | term 1

    term term * factor | term / factor | factor 2

    factor digit | ( expr ) 3digit 0 | 1 | | 9 4

  • 8/7/2019 2_SimpleOnePassCompiler

    14/66

  • 8/7/2019 2_SimpleOnePassCompiler

    15/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 15/662.3 SYNTAX-DIRECTED TRANSLATION(SDT)A formalism for specifying translations for programming language constructs.

    ( attributes of a construct: type, string, location, etc)

    Syntax directed definition(SDD) for the translation of constructs Syntax directed translation scheme(SDTS) for specifying translation

    Postfix notation for an (infix) expression E If E is a variable or constant, then the postfix nation for E is E itself ( E.tE ).

    if E is an expression of the form E1 op E2 where op is a binary operator

    E1' is the postfix of E1, E2' is the postfix of E2 then E1' E2' op is the postfix for E1 op E2

    if E is (E1), and E 1' is a postfix

    then E1' is the postfix for E

    eg) 9 - 5 + 2 9 5 - 2 +

    9 - (5 + 2) 9 5 2 + -

  • 8/7/2019 2_SimpleOnePassCompiler

    16/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 16/66Syntax-Directed Definition(SDD) for translationSDD is a set of semantic rules predefined for each productions respectively fortranslation.

    A translation is an input-output mapping procedure for translation of an input X,

    construct a parse tree for X.

    synthesize attributes in order of depth-first traversal over the parse tree.

    Suppose a node n in parse tree is labeled by X and X.a denotes the value of attribute aof X at that node.

    compute X's attributes X.a using the semantic rules associated with X.

    Depth-First Traversalsprocedure visit(n: node)

    beginfo r each child m of n, from left to right do visit(m);

    evaluate semantic rules at node nend

  • 8/7/2019 2_SimpleOnePassCompiler

    17/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 17/66

    An example of translating input X=9-5+2 to postfix expression (synthesized attributes for input X=9-5+2)

    1. produce parse tree for the input X=9-5+22. evaluate attribute of node according to the dept-first order based on the SDD.

  • 8/7/2019 2_SimpleOnePassCompiler

    18/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 18/66Syntax-directed Translation Schemes(SDTS) A translation scheme is a context-free grammar in which program fragments called

    translation actions are embedded within the right sides of the production.

    productions(infix) SDD for postfix notation from infix SDTS for postfix notation from infixlist list + term list.t = list.t || term.t || "+" list list + term {print("+")}

    {print("+");} : translation(semantic) action.

    1. Design translation schemes(SDTS) for translation

    2. Translate :a) parse the input string x andb) emit(execute) the action result encountered during the depth-first traversal of parse

    tree.

  • 8/7/2019 2_SimpleOnePassCompiler

    19/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 19/66

    Example 2.8.

    SDD vs. SDTS for infix to postfix translation.

    productions SDD SDTSexpr list + termexpr list + termexpr termterm 0term 1

    term 9

    expr.t = list.t || term.t || "+"expr.t = list.t || term.t || "-"expr.t = term.tterm.t = "0"term.t = "1"

    term.t = "9"

    expr list + term printf{"+")}expr list + term printf{"-")}expr termterm 0 printf{"0")}term 1 printf{"1")}

    term 9 printf{"0")}

    An Example of an Action translation for input 9-5+2

    1) Parse.2) Translate.Do we have to maintain the whole parsetree ?No, Semantic actions are performed

    during parsing, and we don't need thenodes (whose semantic actions done) anymore.

  • 8/7/2019 2_SimpleOnePassCompiler

    20/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 20/662.4 PARSINGif token string x L(G), then parse tree

    else error message

    Top-Down parsing1. At node n labeled with nonterminal A, select one of the productions

    whose left part is A and construct children of node n with the symbols on the right side

    of that production.

    2. Find the next node at which a sub-tree is to be constructed.

    ex. G: type simple

    id array [ simple ] of type simple integer char num dotdot num

  • 8/7/2019 2_SimpleOnePassCompiler

    21/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 21/66

    An example of top-down parsing for input x=array [ num dotdot num ] of integer

  • 8/7/2019 2_SimpleOnePassCompiler

    22/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 22/66

    type simple

    id array [ simple ] of type simple integer char num dotdot num

    The selection of production for a nonterminal may involve trial-and-error.

    => backtracking

  • 8/7/2019 2_SimpleOnePassCompiler

    23/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 23/66Procedure of top-down pars inglet a pointed grammar symbol and pointed input symbol be g, a respectively. if( g N )

    then select a production P whose left part equals to g next to current production.( select

    expand derivation with that production P.

    else if( g = a ) then make g and a be a symbol next to current symbol.else if( g a ) back tracking

    let the pointed input symbol a be the symbol that moves back to steps same with thenumber of current symbols of underlying production

    eliminate the right side symbols of current production and let the pointed symbol g be theleft side symbol of current production.

  • 8/7/2019 2_SimpleOnePassCompiler

    24/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 24/66

    An example of top-down parsing for grammarG : { S->aSb | c | ab }, is acb , aabbL(G)?

    S/acbaSb/acbaSb/acb aaSbb/acb X(SaSb) move (SaSb) backtracking

    aSb/acbacb/acbacb/acbacb/acb(sc) move move

    so, acb L(G)It is finished in 7 steps including one backtracking.

    S/aabbaSb/aabbaSb/aabbaaSbb/aabbaaSbb/aabbaaaSbbb/aabb X(SaSb) move (SaSb) move (SaSb) backtracking

    aaSbb/aabbaacbb/aabb X(Sc) backtrackingaaSbb/aabbaaabbb/aabb X

    (Sab) backtrackingaaSbb/aabb X

    backtrackingaSb/aabbacb/aabb

    (Sc) bactracking aSb/aabbaabb/aabbaabb/aabbaabb/aabbaaba/aabb

    (Sab) move move moveso, aabbL(G)but process is too difficult. It needs 18 steps including 5 backtrackings.

  • 8/7/2019 2_SimpleOnePassCompiler

    25/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 25/66Predictive parsing (Recursive Decent Parsing, RDP) A strategy for the general top-down parsing

    Guess a production, see if it matches, if not, backtrack and try another.

    It may fail to recognize correct string in some grammar G and is tedious in processing.

    Predictive parsing

    is a kind of top-down parsing that

    predicts a production whose derived terminal symbol is equal to next input symbol whileexpanding in top-down paring. without backtracking. Procedure decent parser is a kind of predictive parser that is implemented by disjoint

    recursive procedures one procedure for each nonterminal, the procedures are patternedafter the productions.

  • 8/7/2019 2_SimpleOnePassCompiler

    26/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 26/66Procedure of predict ive parsing(RDP)let a pointed grammar symbol and pointed input symbol be g, a respectively. if( g N )

    select next production P whose left symbol equals to g and a set of first terminalsymbols of derivation from the right symbols of the production P includes a inputsymbol a.(select

    expand derivation with that production P. else if( g = a ) then make g and a be a symbol next to current symbol. else if( g a ) error

    => RDP requires Left Factoring and Elimination of Left-recursion for grammar

    G: A a, or

    A a 1 1 | a22 | | an n |

    for an d

  • 8/7/2019 2_SimpleOnePassCompiler

    27/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 27/66

    G : { SaSb | c | ab }According to predictive parsing procedure, acb , aabbL(G)?

    S/acb confused in { SaSb, Sab } so, a predictive parser requires some restriction in grammar, that is, there should be only

    one production whose left part of productions are A and each first terminal symbol of

    those productions have unique terminal symbol.

    so G => G1 : { S->aS' | c S'->Sb | ab }

    Requirements for a grammar to be suitable for RDP: For each nonterminal either

    1. A a, or2. A a11 | a22 | | ann |

    for and A may also occur

    if none of ai can follow A in a derivation and if we have A

    If the grammar is suitable, we can parse efficiently without backtrack.

    General top-down parser with backtracking

    Recursive(Recursive) Descent Parser without backtracking

  • 8/7/2019 2_SimpleOnePassCompiler

    28/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 28/66

    An Example of a Recursive Predictive Parser(RDP) for grammarG : type simple | id | array [ simple ] of typesimple

    integer | char | mun dotdot mun procedure match(t: token);

    beginif lookahead = t then lookahead = nexttoken

    else errorend

    procedure type; //cf type simple | id | array [ simple ] of typebegin

    if lookahead is in {integer, real, num} then simple //FIRST(simple)={integer,char,num}else if( lookahead = '' then begin match(''); match(id) end //FIRST(id) ={}else if( lookahead = array then begin //FIRST(array [ simple ] of type)={array}

    match(array);match('[');simple; match(']');match(of); type endelse error

    end procedure simple; //cf simple -> integer | char | mun dotdot mun

    beginif lookahead = integer then match(integer)else if lookahead = char then match(char)else if lookahead = num then begin match(num); match(dotdot); match(num) endelse error end;

  • 8/7/2019 2_SimpleOnePassCompiler

    29/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 29/66Left FactoringIf a grammar contains two productions of form

    S a and S a

    it is not suitable for top down parsing without backtracking. Troubles of this form can

    sometimes be removed from the grammar by a technique called the left factoring.

    In the left factoring, we replace { S a, S a } by{ S aS', S' , S' } (Hopefully and start with different symbols)cf. S= a|a=a(|) = aS', S'=|

    left factoring for G { SaSb | c | ab }SaS' | c cf. S=aSb | ab | c = a ( Sb | b) | c ) = a S' | c, S'=Sb|bS'Sb | b

    A concrete example:

    IF THEN |

    IF THEN ELSE

    is transformed into IF THEN S'

    S' ELSE |

  • 8/7/2019 2_SimpleOnePassCompiler

    30/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 30/66

    Example,

    for G : { SaSb | c | ab }

    According to predictive parsing procedure, acb , aabbL(G)?

    S/aabb unable to choose { SaSb, Sab ?} According for the feft factored gtrammar G1, acb , aabbL(G)?

    G1 : { SaS'|c S'Sb|b}

  • 8/7/2019 2_SimpleOnePassCompiler

    31/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 31/66Left RecursionAn Example of RDP for a left-recursive Grammar

    G: SSa|b, input x=baaS/baaSa/baa Saa/baa Saaa/baa .... (SSa) (SSa) (SSa)

    An Example of RDP for a right-recursive GrammarG1 : SbS ' S'aS'|, L(G)=L(G1)S/baabS'/baabS'/baabaS'/baabaS'/baabaaS'/baabaaS'/baabaa/baabaa/baa

    (SbS') (move) (S'aS') (move) (S'aS') (move) (S')

    A grammar is left recursive iff it contains a nonterminal A, such thatA

    +A, where is any string.

    Grammar {S S | c} is left recursive because of SS

    Grammar {S A, A Sb | c} is also left recursive because of SA Sb

    If a grammar is left recursive, you cannot build a predictive top down parser for it.

  • 8/7/2019 2_SimpleOnePassCompiler

    32/66

  • 8/7/2019 2_SimpleOnePassCompiler

    33/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 33/66

    In general, the rule is that

    If A A 1 | A2 | | A n and

    A 1 | 2 | | m (no i's start with A),then, replace by

    A 1R | 2R| | mR and

    R 1R | 2R | | nR |

    Exercise: Remove the left recursion in the following grammar:

    expr expr + term | expr - termexpr term

    solution: expr term rest rest + term rest | - term rest |

  • 8/7/2019 2_SimpleOnePassCompiler

    34/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 34/662.5 A TRANSLATOR FOR SIMPLE EXPRESSIONSConvert infix into postfix(polish notation) using SDT.Abstract syntax (annotated parse tree) tree vs. Concrete syntax tree

    Concrete syntax tree : parse tree Fig 2.2. p28.

    Abstract syntax tree, syntax tree : Fig 2.20. Concrete syntax : underlying grammar

  • 8/7/2019 2_SimpleOnePassCompiler

    35/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 35/66Adapting the Translation Scheme Embed the semantic action in the production

    Design a translation scheme Left recursion elimination and Left factoring

    Example, translate infix expression to post expression

    Infix expression system(language)

    L={0+1, 2-(5+1), ..}

    G=({expr,term},{+,-,0,19},P,expr)

    P ={expr expr+ term

    expr expr- term

    expr term

    term 0

    term 1

    term 9

    }

    postfix expression system(language)

    L'={0 1 +, 2 5 1 + -, ...}

    G'=({expr,term},{+,-,0,19},P',expr)

    P' = {expr expr term +

    expr expr term -

    expr term

    term 0

    term 1

    term 9

    }

  • 8/7/2019 2_SimpleOnePassCompiler

    36/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 36/66Example of translator design and executionA translation scheme and with left-recursion.

    Fig 2.19. Initial specification for infix-to-postfix translator with left recursion eliminatedexpr expr + term {printf{"+")}expr expr - term {printf{"-")}expr termterm 0 {printf{"0")}term 1 {printf{"1")}

    term 9 {printf{"0")}

    expr term restrest + term {printf{"+")} restrest - term {printf{"-")} restrest term 0 {printf{"0")}term 1 {printf{"1")}

    term 9 {printf{"0")}

  • 8/7/2019 2_SimpleOnePassCompiler

    37/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 37/66Procedure(RDP) for the Nonterminal expr, term, and restm a t c h ( t : t k o e n )

    {

    i f ( l o o k h e a d = t )

    l o o k a h e a d l e x a n ( ) ;

    e l s e e r r o r ( ) ;

    }

    expr term restrest + term {printf{"+")} rest

    rest - term {printf{"-")} restrest term 0 {printf{"0")}term 1 {printf{"1")}

    term 9 {printf{"0")}

  • 8/7/2019 2_SimpleOnePassCompiler

    38/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 38/66

    Optimizer and Translator1. expr() {

    2. term(); rest();3. }

    4. rest() rest()

    5. { {

    6. if(lookahead == '+' ) { L: if(lookahead == '+' ) {

    7. m('+'); term(); p('+'); rest(); m('+'); term(); p('+'); goto L;

    8. } else if(lookahead == '-' ) { } else if(lookahead == '-' ) {

    9. m('-'); term(); p('-'); rest(); m('-'); term(); p('-'); goto L;

    10. } else ; } else ;11. } }

    12. expr() {

    13. term();

    14. while(1) {

    15. if(lookahead == '+' ) {

    16. m('+'); term(); p('+');

    17. } else if(lookahead == '-' ) {

    18. m('-'); term(); p('-');19. } else break;

    20. }

  • 8/7/2019 2_SimpleOnePassCompiler

    39/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 39/662.6 LEXICAL ANALYSISreads and converts the input into a stream of tokens to be analyzed by parser.

    lexeme : a sequence of characters which comprises a single token.

    Lexical Analyzer Lexeme / Token Parser

    Removal of White Space and CommentsRemove white space(blank, tab, new line etc.) and commen ts

    ContsantsConstants: consider only integers for a while.

    eg) for input 31 + 28, output : ?input : 31 + 28output:

    tokenattribute, value(or lexeme)

  • 8/7/2019 2_SimpleOnePassCompiler

    40/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 40/66RecognizingIdentifiers Identifiers are names of variables, arrays, functions... A grammar treats an identifier as a token.

    eg) input : count = count + increment;output : ;Symbol table

    tokens attributes(lexeme)0

    12

    3

    idid

    countincrement

    Keywords are reserved, i.e., they cannot be used as identifiers.Then a character string forms an identifier only if it is no a keyword.

    punctuation symbols

    operators : + - * / := < >

  • 8/7/2019 2_SimpleOnePassCompiler

    41/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 41/66Interface to lexical analyzer

  • 8/7/2019 2_SimpleOnePassCompiler

    42/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 42/66A Lexical Analyzer

    c=getchcar(); ungetc(c,stdin);

    token representation

    #define NUM 256

    Function lexan()eg) input string 76 + ainput , output(returned value)76 token=NUM tokenval=76 (integer)

    + token=+a toekn=id, tokeval="a"

  • 8/7/2019 2_SimpleOnePassCompiler

    43/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 43/66

    A way that parser handles the token NUM returned by laxan()

    consider a translation scheme

    factor ( expr )| num { print(num.value) }

    in the RDP for the above production.

    #define NUM 256

    ...

    factor() {

    if(lookahead == '(' ) {match('('); exor(); match(')');

    } else if (lookahead == NUM) {

    printf(" %f ",tokenval); match(NUM);

    } else error();

    }

  • 8/7/2019 2_SimpleOnePassCompiler

    44/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 44/66The implementation of function lexan for simple alphabet={integer numbers, +,-,*,/...}

    1 ) # include 2) #include

    3) in t lino = 1;4) int tokenval = NONE;5) int lexan() {6) int t;7) while(1) {8) t = getchar();9) if ( t==' ' || t=='\t' ) ;10) else if ( t=='\n' ) lineno +=1;11) else if (isdigit(t)) {12) tokenval = t -'0';

    13) t = getchar();14) while ( isdigit(t)) {15 ) tokenval = tokenval*10 + t - '0';16) t =getchar();17) }18) ungetc(t,stdin);19) return NUM;20) } else {21 ) tokenval = NONE;22) return t; //+,-,...23) }24) }25) }

  • 8/7/2019 2_SimpleOnePassCompiler

    45/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 45/66

    2.7 INCORPORATION A SYMBOL TABLE

    The symbol table interfaces, operation, usually called by parser. insert(s,t): input s: lexeme

    t: tokenoutput index of new entry

    lookup(s): input s: lexemeoutput index of the entry for string s,

    or 0 if s is not found in the symbol table.Handling reserved keywords

    1. Inserts all keywords in the symbol table in advance.ex) insert("div", div)

    token lexeme

    insert("mod", mod)

    2. while parsing

    whenever an identifier s is encountered.

    if (lookup(s)'s token is in {keywords} ) s is for a keyword;else s is for a identifier;

  • 8/7/2019 2_SimpleOnePassCompiler

    46/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 46/66

    example

    preset

    insert("div",d iv );insert("mod",m od ); while parsing

    count = i + i + div

    lookup("count")=>0 insert("count",id );lookup("i") =>0 insert("i",id );lookup("i") =>4, idllokup("div")=>1,d iv

  • 8/7/2019 2_SimpleOnePassCompiler

    47/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 47/662.8 ABSTRACT STACK MACHINE An abstract machine is for intermediate code generation/execution.

    Instruction classes: arithmetic / stack manipulation / control flow3 components of abstract stack machine1) Instruction memory : abstract machine code, intermediate code(instruction)2) Stack 3)Data memory

    An example of stack machine operation.

    for a input a=11; b=7; (5+a)*b; 1) lexical analysis and parsed(5 a + b *) =>2) intermediate codes : push 5, rvalue 2, +, rvalue 3, *

    instruction memory1 push 5 pc2 rvalue 23 +

    4 rvalue 3

    5 *

    Data memory

    1 0

    2 11 a

    3 7 b

    Stack

    1 5

    2 115

    3 16

    4 716

    5 112

  • 8/7/2019 2_SimpleOnePassCompiler

    48/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 48/66L-value and r-value l-values a : address of location a

    r-values a : if a is location, then content of location aif a is constant, then value a

    eg) a :=(5 + a ) * b;

    lvalue a lvalue 2

    r value 5 push 5 r value a rvalue 2 r value of b rvalue 3

    Stack ManipulationSome instructions for assignment operation

    push v : push v onto the stack.

    rvalue a : push the contents of data location a.

    lvalue a : push the address of data location a.

    pop : throw away the top element of the stack.

    := : assignmen t for the top 2 elements of the stack. copy : push a copy of the top element of the stack.

  • 8/7/2019 2_SimpleOnePassCompiler

    49/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 49/66Translation of ExpressionsInfix expression(IE) SDD/SDTS Abstact macine codes(AMC) of postfix expression

    for stack machine evaluation.eg )

    IE: a + b, (PE: a b + ) IC: rvalue arvalue b+

    day := (1461 * y) div 4 + (153 * m + 2) div 5 + d( day 1462 y * 4 div 153 m * 2 + 5 div + d + :=)

    1) lvalue day 6) div 11) push 5 16) :=2) push 1461 7) push 153 12) div3) rvalue y 8) rvalue m 13) +4) * 9) push 2 14) rvalue d5) push 4 10) + 15) +

    A translation scheme for assignment-statement into abstract stack machine code e canbe expressed formally In the form as follows:stmt id := expr

    { stmt.t := 'lvalue' || id .lexeme || expr.t || ':=' }eg) day :=a+b lvalue day rvalue a rvalue b + :=

  • 8/7/2019 2_SimpleOnePassCompiler

    50/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 50/66Control Flow3 types of jump instructions :

    1. Absolute target location

    2. Relative target location( distance :Current Target)

    3. Symbolic target location( i.e. the machine supports labels)

    Control-flow instructions:

    l a b e l a : t h e j u m p ' s t a r g e t a

    g o t o a : t h e n e x t i n s t r u c t i o n i s t a k e n f r o m s t a t e m e n t l a b e l e d a g o f a l s e a : p o p t h e t o p & i f i t i s 0 t h e n j u m p t o a

    g o t r u e a : p o p t h e t o p & i f i t i s n o n z e r o t h e n j u m p t o a h a l t : s t o p e x e c u t i o n

  • 8/7/2019 2_SimpleOnePassCompiler

    51/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 51/66Translation of Statements Translation scheme for translation if-statement into abstract machine code.

    stmt if expr then stmt1{ out := newlabel1)

    stmt.t := expr.t || 'gofalse' out || stmt1.t || 'label' out }

    Translation scheme for while-statement ?

    1) a procedure generates a unique label(eg. zzzzz001, zzzzz002, etc) whenever it is called!

  • 8/7/2019 2_SimpleOnePassCompiler

    52/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 52/66Emitting a TranslationSemantic Action(Tranaslation Scheme):

    stmt if expr { out := newlabel; emit('gofalse', out) } then stmt1 { emit('label', out) }

    stmt id { emit('lvalue', id.lexeme) } := expr { emit(':=') }

    stmt if expr { out := newlabel; emit('gofalse', out) } then stmt1 { emit('label', out) ; out1 := newlabel; emit('goto', out1); }

    else stmt2 { emit('label', out1) ; } if(expr==false) goto out

    stmt1

    goto out1out : stmt2out1:

  • 8/7/2019 2_SimpleOnePassCompiler

    53/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 53/66

    implementation

    procedure stmt()

    var test,out:integer; begin // stmt id { emit('lvalue', id.lexeme) } := expr { emit(':=') } if lookahead = id then begin emit('lvalue',tokenval); match(id ); match(':='); expr(); emit(':='); end // stmt if expr { out := newlabel; emit('gofalse', out) } then stmt1 { emit('label', out) } else if lookahead = 'if' then begin match('if'); expr(); out := newlabel(); emit('gofalse', out); match('then'); stmt; emit('label', out) end

    else error(); end

  • 8/7/2019 2_SimpleOnePassCompiler

    54/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 54/66Control Flow with Analysisif E1 or E2 then S vs if E1 and E2 then S

    E 1 o r E 2 = i f E 1 t h e n t r u e e l s e E 2E 1 a n d E 2 = i f E 1 t h e n E 2 e l s e f a l s e

    T h e c o d e fo r E 1 o r E 2 .

    Codes for E1 Evaluation result: e1

    copy

    gotrue OUT

    pop

    Codes for E2 Evaluation result: e2

    label OUT

  • 8/7/2019 2_SimpleOnePassCompiler

    55/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 55/66

    T h e f u l l c o d e f o r if E 1 o r E 2 t h e n S ;

    codes for E1

    copy gotrue OUT1

    pop

    codes for E2

    label OUT1

    gofalse OUT2

    code for S label OUT2

    Exercise: How about if E1 and E2 then S;

    if E1 and E2 then S1 else S2;

  • 8/7/2019 2_SimpleOnePassCompiler

    56/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 56/66

    2 . 9 P u t t i n g t h e t e c h n i q u e s t o g e t h e r !

    infix expression postfix expression

    eg) id+(id-id)*num/id id id id - num * id / +

    Description of the TranslatorSyntax directed translation scheme(SDTS) to translate the infix expressions

    into the postfix expressions, Fig2.35

  • 8/7/2019 2_SimpleOnePassCompiler

    57/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 57/66

    Structure of the translator, Fig 2.36

    o global header file "header.h"

  • 8/7/2019 2_SimpleOnePassCompiler

    58/66

  • 8/7/2019 2_SimpleOnePassCompiler

    59/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 59/66The Parser Module parser.cSDTS Fig 2.35

    left recursion elimination

    New SDTS Fig 2.38

  • 8/7/2019 2_SimpleOnePassCompiler

    60/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 60/66

    An example of parsing for the input 12 div 5 + 2 ;

  • 8/7/2019 2_SimpleOnePassCompiler

    61/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 61/66The Emitter Module emitter.cemit (t,tval)

    The Symbol-Table Modules symbol.c and init.cSymbol.c

    data structure of symbol table Fig 2.29 p62

    insert(s,t)

    lookup(s)

    The Error Module error.cExample of execution

    input 12 div 5 + 2

    output 12

    5

    div

    2

    +

  • 8/7/2019 2_SimpleOnePassCompiler

    62/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 62/66

  • 8/7/2019 2_SimpleOnePassCompiler

    63/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 63/66

  • 8/7/2019 2_SimpleOnePassCompiler

    64/66

  • 8/7/2019 2_SimpleOnePassCompiler

    65/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 65/66

  • 8/7/2019 2_SimpleOnePassCompiler

    66/66

    Compiler, Computer Eng. Dept., Hanbat National University, Yoon-Joong Kim 66/66