Francis Compiler

Embed Size (px)

Citation preview

  • 8/9/2019 Francis Compiler

    1/103

    Introduction to Compiler Writing

    R. C. T. Lee

    Institute Of Computer and Decision Sciences

    National Tsing Hua Universit

    Hsinc!u" Tai#an

    Repu$lic Of C!ina

    C. W. S!en

    %irst International Computer Corporation

    Taipei" Tai#an

    Repu$lic Of C!ina

    S. C. C!ang

    Department Of &lectrical &ngineering and Computer Sciences

    Nort!#estern Universit

    &vanston" Illinois

    U. S. '.

  • 8/9/2019 Francis Compiler

    2/103

    '(STR'CT

      This chapter introduces compiler writing techniques. It was written

    with the following observations made by the authors:

      (1) While compiler writing may be done without too much nowledge

    of formal language or automata theory! most boos on compiler writing

    devote most of their chapters on these topics. This inevitably gives most

    readers a false impression that compiler writing is a difficult tas. In fact!

    it is our observation that may people many easily develop some inferior 

    comple" out or reading these difficult boos.

      We have deliberately avoided this formal approach. #ach aspect of 

    compiler writing is presented in such an informal way that any naive

    reader can finish reading this chapter within! say five or si"! casual hours.

    $s a matter of fact! a lot of readers finished reading this chapter before

    they went to sleep. They almost treated this chapter as a bed%time novel.

      (&) There are several different aspects of compiler writing. 'ost

     boos would give detailed presentations on every aspect and still fail to

    give the reader some feeling on how to write a compiler. We decided to

    use a different approach we shall give an anatomy of a simple compiler.

    'ost important of all! the internal data structure of this simple compiler 

    will be clearly presented. #"amples will be given and a reader will be

    able to understand how a compiler wors by studying these e"amples. In

    &

  • 8/9/2019 Francis Compiler

    3/103

    other words! we mae sure that a reader will not lose his global view of 

    compiler writing he sees not only trees! but also forests.

  • 8/9/2019 Francis Compiler

    4/103

    *ection 1. Introduction.

      $ compiler is a software which translates a high%level language

     program into a low%level language program which can be e"ecuted by a

    computer. In many compilers! the low%level language is the machine

    language. +or our purpose! we assume that our compiler translates a

    high%level language into an assembly language simply because it is easier 

    for us to comprehend assembly languages.

    It is our e"perience that a student! while learning compiler writing

    technique! has to spend a ma,or part of his time studying an assembly

    language. This is rather unfortunate because the code generation part of a

    compiler is quite straightforward. -ne ,ust has to remember many

    details. y spending too much time in learning assembly languages! a

    student loses his precious time to study the other important techniques in

    compiler writing.

      To mae sure that a student would not have to spend too much time

    in studying assembly languages! we decided to use a very simple

    assembly language. The assembler language is described in *ection /. $s

    the reader can see! it is very easy to learn .

    0

  • 8/9/2019 Francis Compiler

    5/103

    et! it contains enough instructions that it will not be difficult for a

    student to translate a high%level language into this assembly language.

      If a student is ased to write a compiler with our assembly language as

    its target language! he will not be bogged down by the details of the

    assembly language. Instead! he can concentrate all his mind to study the

    other important compiler writing sills.

      *o far as the high%level language is concerned! it is the same story we

    do not want to use a very complicated high%level language! such as

    +-2T2$3! or 4$*5$6. We do not want to use a subset of +-2T2$3!

    either for various reasons which will become clear later. We therefore

    designed a high%level language! called +2$35I*! for our compiler 

    writing purpose. +2$35I* is sufficiently complicated that writing a

    compiler for it is a non%trivial ,ob. *till! +2$35I* is ideal for compiler 

    writing because we deliberately designed it for this purpose. The reader 

    will gradually appreciate this point.

      #ssentially! +2$3I5* is similar to +-2T2$3! e"cept that it contains

    some special features from 4$*5$6. It contains most of the well%nown

    +-2T2$3 instructions! such as 7I'#3*I-3! *82-8TI3#! 9- T-

    and so on. *o far as the I+ statement is concerned! we have an I+.

    T;# . #6*# type of statement! a typical feature borrowed from

    4$*5$6. ;owever! to simplify the synta" analysis discussion! we do not

    allow I+ inside I+. $ll of the variables have to be declared. $gain! a

    feature borrowed from 4$*5$6. We shall first assume that +2$35I* is

    /

  • 8/9/2019 Francis Compiler

    6/103

    not a recursive language in the sense that subroutines in +2$35I* can

    not call themselves. $fter maing sure that students understand the basic

    concept of compiler writing! we then add the recursive feature and also

    allow the I+ statement to contain I+ statements. y that time! it should

    not be difficult at all to understand how these can be done.

      The reader may as: Why is the language called +2$35I*. Well! we

    name it +2$35I* in memory of the great saint! *t. +rancis of $ssisi.

      This chapter was written when the world was facing an energy crisis.

    $ lot of people were quite worried because they might face a dry summer 

    (not enough gasoline for their cars) and a cold winter (not enough heating

    oil for their homes). The fact that millions of homeless people having not

    enough food to eat and not enough water to drin did not seem to bother 

    them.

      We believe that the world does not lac energy. What we need is more

    warmth and more love. $s the saying goes!

  • 8/9/2019 Francis Compiler

    7/103

    : > ? @ A B C

    #nglish words enclosed by B and C are elementary constructs in

    +2$35I*! The symbol ::> means BletterC@BletterC?BdigitCA.

      In +2$35I* reserved wors are used. These reserved words can not

     be used as ordinary identifiers. They are all underlined in the definition.

    +or instance! the following e"pression describes the synta" rules of array

    declaration instructions:

      Barray declaration part ::> 7I'#3*I-3 array declaration +or the

    definition of array definition! consult the appendi".

    E

  • 8/9/2019 Francis Compiler

    8/103

    *ection &. $ 9limpse of 5ompilers.

    efore getting into the details of a compiler! let us consider a very

    simple program and see what the compiler should do. The program is as

    follows:

    F$2I$6# I3T#9#2: G! ! H

    G > E

    >

    H >G %

    #37

    +or the first instruction declaring G! and H to be integers! the

    compiler will generate the following assembly language instructions:

    G 7* 1

    7* 1

    H 7* 1

    +or the instruction

    G > E

    the compiler will create a constant! say called Il! and assembly language

    codes as follows:

    J

  • 8/9/2019 Francis Compiler

    9/103

  • 8/9/2019 Francis Compiler

    10/103

    I1 75 E

      67 I1

      *T G

      *imilarly! for instruction

    >

    the compiler will generate the following codes:

    I& 75

    67 I&

    *T

      +inally! for the instruction

    H > G K

    the compiler will generate

    67 G

    $7

    *T T1

    67 T1

    *T H

    The reader may wonder why the above sequence of codes are so

    inefficient. They should simply be

    67 G$7

    *T H

    In fact! we deliberately showed the codes which are not

    efficient for reasons that will become clear later.

      8ltimately! the high%level language program will be translated into the

    following assembly language codes:

    1L

  • 8/9/2019 Francis Compiler

    11/103

    G 7* 1

    7* 1

    H 7* 1

    I1 75 E

    I& 75

    67 I1

    *T G

    67 I&

    *T

    67 G

    $7

    *T T1

    67 T1

    *T H

    This is what a compiler would do. ut! how does it do itM

      asically! a compiler is divided into three parts: the le"ical analyDer!

    the synta" analyDer and the code generator! as described below:

    ;igh level 6#GI5$6 *3T$G 5-7#

    language programs $3$6H#2 $3$6H#2 9#3#2$T-2 

    fig. &.1

      (1) The input of the le"ical analyDer is a sequence of characters

    representing a high%level language program. The ma,or function of the

    le"ical analyDer is to divide this sequence of characters into a sequence of 

    11

  • 8/9/2019 Francis Compiler

    12/103

    toens. +or instance! consider the instruction.

    F$2I$6# I3T#9#2: GN N H

    The output of the le"ical analyDer isF$2I$6#

    I3T#9#2 

    :

    G

    N

    N

    H

      (&) The input of the synta" analyDer is a sequence of toens which is

    the output of the le"ical analyDer. The synta" analyDer divides the toens

    into instructions. +or each instruction! using the (grammatical rules)

    already stored! the synta" analyDer determines whether this instruction is

    grammatically correct. If not! error messages will be printed out.

    -therwise! this instruction is decomposed into a sequence of basic

    instructions and these instructions are fed into the code generator to

     produce assembly language codes.

    +or instance! consider the instruction

    G>K8OF

    The synta" analyDer would determine that this instruction is indeed

    correct and later produce three basic instructions as follows

    T1>8OF

    T&>KT1

    G>T&

      () The code generator accepts the output from the synta" analyDer.

    1&

  • 8/9/2019 Francis Compiler

    13/103

    #very basic instruction is now translated into a sequence of assembly

    language instructions. This can be easily done because the code generator 

    can e"amine the pattern of the basic instruction and determine which

    sequence of codes it must generate.

      To mae sure that the code generator can recogniDe the pattern easily!

    we may assume that each basic instruction produced by the synta"

    analyDer is of a standard from. In our case! we may assume that the

    standard form is a quadruple as follows:

    (operator! -perand1! -perand& ! 2esult).

    +or instance! for basic instruction

    G>KHN

    its standard form will be

    (KN N HN G)

    +or the instruction

    G>%HN

    The standard form will be

    (% N N HN G)

    The code generator generates assembly language codes by e"amining the

    operator! operands and result.

    1

  • 8/9/2019 Francis Compiler

    14/103

    *ection . The 6e"ical $nalyDer 

      $s we indicated before! the function of the le"ical analyDer is to group

    the input sequence of the characters into toens. This is the ma,or 

    function. $ctually! as the reader will see! the le"ical analyDer does more

    than identifying toens.

    *ection .1 The Identification of Toens.

      *o far as the language +2$35I* is concerned! the identification of 

    toens is trivial. 3ote that toens in +2$35I* are separated by

    delimiters! such as

  • 8/9/2019 Francis Compiler

    15/103

     

    1/

  • 8/9/2019 Francis Compiler

    16/103

      3ote that for some other high%level languages! it is by no means easy to

    identify toens. $ typical e"ample is +-2T2$3. In +ortran!

  • 8/9/2019 Francis Compiler

    17/103

    7elimiters

    1

    & (

    )0 >

    / K

    = %

    E O

    J P

    1L Q

    11 R

    1& :Table 1

    *ection .&. The Types of Toens 2ecogniDed by the 6e"ical $nalyDer.

      We have shown how the le"ical analyDer identifies toens. In this

    section! We shall illustrate some other functions of the le"ical analyDer.

      3ote that the ma,or ,ob of the le"ical analyDer is to prepare everything

    for the synta" analyDer. Imagine that the synta" analyDer encounters the

    following instruction:

    F$2I$6# I3T#9#2 G! ! H

    In this case! the synta" analyDer recogniDes that this instruction is a

    declaration instruction because it detects the toen SF$2I$6#. In

    +2$35I*! the word SF$2I$6# has its particular meaning and can

    not be used in any other way. +or instance! one can not write the

    following instruction:

    1E

  • 8/9/2019 Francis Compiler

    18/103

    VARIABLE = VARIABLE + 1;

      The word SF$2I$6# is a reserved word for +2$35I*. *imilarly!

    the words SI+ ! ST;#3 ! S#6*# and so on are all reserved words in

    +2$35I* which can not be used as ordinary variables.

      It is appropriate to point out here that in +-2T2$3! there is no such a

    reserved word concept. Indeed! one may have

    I+ > I+ K 1

    in +-2T2$3 and this is absolutely not allowed in +2$35I*.

      *ince the recognition of reserved words is so important for the synta"

    analyDer! it is appropriate for the le"ical analyDer to do the ,ob for it.

    Table & contains all of the reserved words used in +2$35I*. *ince this

    table is also1. $37&. --6#$3. 5$660. 7I'#3*I-3/. #6*#

    =. #34E. #3*J. #U. 9#1L. 9T11. 9T-1&. I+1. I348T10. I3T#9#2 1/. 6$#61=. 6#1E. 6T1J. 3#

    1. -2 &L. -8T48T&1. 42-92$'&&. 2#$6&. *82-8TI3#&0. T;#3&/. F$2I$6#

    Table & (2eserved Word Table)

    1J

  • 8/9/2019 Francis Compiler

    19/103

    searched very frequently! we may use hashing to store and retrieve data

    for this table.

      In addition to recogniDing reserved words! we may also recogniDe

    numerical constants in the le"ical analyDer phase. This can be easily done.

    If a toen starts with a number! it must either be a real number or an

    integer. It is easy to differentiate these two because a real number 

    contains the S..

      In summary! for any toen! the le"ical analyDer determines whether it

    is a delimiter! a real number! an integer! a reserved word or an identifier.

    We! of course! have never defined identifiers. $n identifier is a toen

    which is not a delimiter! a real number! an integer or a reserved word. In

    general! an identifier usually denotes a variable! an array! a label! the title

    of a program or the title of a subroutine.

    *ection .. The 2epresentation of Toens.

      We now as the question: ;ow does the le"ical analyDer let the synta"

    analyDer now what the type of a toen is .

      efore answering this question! let us note another problem. -ur 

    toens are of different lengths. +or internal representation! this is not an

    ideal situation.

     3ote that a toen! after the le"ical analysis is e"ecuted! must belong to

    one of the following types:

    1

  • 8/9/2019 Francis Compiler

    20/103

    1. 7elimiters.

    &. 2eserved words.

    . Integers.

    0. 2eal numbers.

    /. Identifiers.

      #ach class of toens has a table associated with it. We have already

    shown that delimiters are contained in Table 1 and reserved words are

    contained in Table &. We now discuss why the other toens also need

    tables.

      Whenever a toen is identified as an integer! we put this integer into

    Table . This is necessary because finally! the code generator is going to

    use this table to generate constants. +or instance! consider the following

    instructions

    G > /

    > E

    $fter the e"ecution of the le"ical analysis phase! Table will contain two

    integers as below:

    /

    E

    .

    .

    .

      *imilarly! we shall designate Table 0 to contain real numbers.

      +or each identifier! we shall put it into some location in Table /. the

    Identifier Table. The Identifier Table is designed to contain three entries:

    &L

  • 8/9/2019 Francis Compiler

    21/103

    the subroutine to which the identifier belongs! the type of the identifier 

    and a pointer pointing to some other tables. These will be e"plained in

    detail later. 'eanwhile! we merely have to note the following:

      (1) #very identifier is hashed into Table /.

      (&) If an identifier appears more than once in the same subroutine! this

    identifier is put into Table / only once.

      () If the same identifier appears in different subroutines! it will

    occupy different locations in Table /.

    We shall use the second column entry of Table / to point to the subroutine

    in which identifier appears.

      6et us consider the following e"ample.

      42-92$' '$I3

      F$2I$6# I3T#9#2: 8! F! W! H

      8 > /=

      F > E

      W > 8 K F

      5$66 $1(W! 1=! H)

      #34

      *82-8TI3# $1(I3T#9#2: G! ! H)

      H > G K&.E O

      #3*

    &1

  • 8/9/2019 Francis Compiler

    22/103

      In the above e"ample! there are two subroutines: '$I3 and $1 (The

    main program is also considered to be a subroutine.). The identifiers

    which occur in the main program are

    '$I3

      8

      F

      W

      H

    and the identifiers occurring in the subroutine $1 are

      $1

      G

     

      H.

     3ote that the identifier H occurs in both '$I3 and $1. It will appear 

    twice in Table /.

      $fter the e"ecution of the le"ical analyDer! Table / may loo lie the

    table shown below:Identifier *ubroutine Type 4ointer  

    1& W '$I30 H

    / F = G JEJ $1 8 J

    1L J11 H

    Table /

    &&

  • 8/9/2019 Francis Compiler

    23/103

    5onsider the second entry of the above table. This entry contains

    Identifier W which points to the third location of Table /. *ince the third

    location of Table / contains '$I3! this identifier W appears in

    *ubroutine '$I3. *imilarly! in the tenth location! we have Identifier  pointing to the eighth location of this table. *ince the eighth location

    corresponds to *ubroutine $1! appears in *ubroutine $1.

      We have shown that each toen! after the le"ical analysis phase! is

    found to be associated with a unique location of one of the tables from

    Table 1 to Table /. The meaning of each table is summariDed as follows:

    1. 7elimiter Table

    &. 2eserved Word Table

    . Integer Table

    0. 2eal 3umber Table

    /. Identifier Table.

      We may say that each toen is characteriDed by two numbers i and , if 

    it occupies the ,th location of the ith table. Indeed! to solve the problem

    that toens are of different lengths! we may simply use this &%tuple (i !

     ,)to represent this toen. 3ote that this representation is one%to%one in the

    sense that given ( i ! , )! we can uniquely identify the toen. +or instance!

    for the above e"ample! the &%tuple (/!&) denotes Identifier W and (/!1L)

    denotes .

      6et us now give a complete e"ample to illustrate our idea by

    considering the following programs:

      42-92$' '$I3

      F$2I$6# I3T#9#2: 8! F! '

    &

  • 8/9/2019 Francis Compiler

    24/103

      8 > /

      F >E

      5$66 *1(8! F! ')

      #34

      *82-8TI3# *1(I3T#9#2: G! ! ')

      ' > G K K&.E

      #3*

      $fter the e"ecution of the le"ical analyDer! Table will be as

    follows:

    1 /

    & E

    Table (Integer Table)

      The 2eal 3umber Table will contain only one element as below:

    1 &.E

    Table 0 (2eal 3umber Table)

      6et us assume that after identifiers are put into Table / Table /. loos

    as follows:

      Identifier *ubroutine Type 4ointer 

    1 8

    &

    '$I3

    &0

  • 8/9/2019 Francis Compiler

    25/103

    0 1L

    / F

    = '

    EJ G 1L

    ' 1L

    1L *1

    Table / (Identifier Table)

    The entire program will now be represented as shown below:

    42-92$' '$I3 

    (&!&1) (/!)(1!1)

    F$2I$6# I3T#9#2: 8 ! F ! '

    (&!&/) (&!10) (1!1&) (/!1) (1!11) (/!/) (1!11) (/!=) (1!1)

    8 > /

    (/!1) (1!0) (!1) (1!1)

    F > E

    (/!/) (1!0) (!&) (1!1)

    5$66 *1 ( 8 ! F ! ' )

    (&!) (/!1L) (1!&) (/!1) (1!11) (/!/) (1!11) (/!=) (1!) (1!1)#34

    (&!=) (1!1)

    *84-8TI3# *1 ( I3T#9#2 : G ! ! ' )

    (&!&) (/!1L) (1!&) (&!10) (1!1&) (/!J) (1!11) (/!0) (1!11) (/!) (1!) (1!1)

    ' > G K K &.E

    (/!) (1!0) (/!J) (1!/) (/!0) (1!/) (0!1) (1!1)

    #3*

    (&!E) (1!1)

    &/

  • 8/9/2019 Francis Compiler

    26/103

    *ection 0. The *ynta" $nalyDer 

      The functions of the synta" analyDer are as follows:

      (1) The synta" analyDer divides the input toens into instructions.

      (&) +or each instruction! the synta" analyDer determines whether this

    instruction is grammatically correct.

      () If the instruction is not grammatically correct! it is re,ected and

    some error message is printed out. -therwise! it is decomposed into a

    sequence of basic instructions each basic instruction is in a standard

    form. In our case! we shall use quadruples to represent basic instructions.

      (0) *ome other information obtained by the synta" analyDer will be put

    into the various tables of the compiler.

      The first ,ob of the synta" analyDer is to divide the input toen stream

    into instructions. This is easy for +2$35I* because in +2$35I*! every

    instruction is terminated by the delimiter S. We would lie to point out

    that this is not the case in +-2T2$3! for instance! because +-2T2$3

    assumes that an instruction is contained in one card unless in the ne"t

    card! a symbol appears in the =th column. This maes the ,ob of 

    determining the termination of an instruction much harder.

    &=

  • 8/9/2019 Francis Compiler

    27/103

      3ote that even in 4$*5$6! the separation of toens into instructions

    is more complicated. +or instance! we do not allow an instruction as

    follows:

    F$2I$6# I3T#9#2 : G! ! H 2#$6 : 8! F

    which is perfectly legal in 4$*5$6. In +2$35I*! we shall have two

    instructions instead:

    F$2I$6# I3T#9#2 : G! ! H:

    F$2I$6# 2#$6 : 8! F

      In 4$*5$6! compound statements are allowed. They start with #9

    and end with #37. Within #9 and #37! there can be a sequence of 

    instructions! each of which is terminated with S. The reader should

    realiDe that we deliberately designed +2$35I* to have such ind of 

    synta" rules so that a compiler can be easily written. In +2$35I*! we

    only allow an I+ T;#3 #6*# instruction as follows:

    I+ 4 $37 U T;#3 G > GK 1 #6*# G>G%1

     3ote that the entire statement is only one instruction. It should not be

    difficult for the reader to modify his +2$35I* compiler to handle an

    instruction as follows:

    I+ 4 $37 U T;#3 #9 G > G K 1 > G K & #37

      #6*# #9 G > G K & > G K 6 #37

    &E

  • 8/9/2019 Francis Compiler

    28/103

    *ection 0.1. The 2ecognition of 7ifferent Types of Instructions.

      In +2$35I*! instructions may be classified into the following

    categories:

    (1)program heading instructions

    (&)subroutine heading instructions

    ()variable type declaration instructions

    (0)label declaration instructions

    (/)dimension declaration instructions

    (=)I+ T;#3 #6*# instructions

    (E)9- T- instructions

    (J)assignment instructions

    ()5$66 instructions

    (1L)IP- instructions

      #ach type of instruction has its own synta" rules. The synta" analyDer 

    has to recogniDe each instruction and branch to a subroutine to handle this

    instruction. +or instance! consider a 9- T- instruction. This instruction

    must start with a reserved word 9T-. This 9T- must be followed by a

    label. This label must be of the type of an identifier. That is ! it must start

    with a letter and followed by integers! letters or nothing. Then! finally! we

    e"pect a S following this label. *ince we have stipulated that every label

    must have been declared before it is used! we have to chec whether the

    &J

  • 8/9/2019 Francis Compiler

    29/103

    identifier following 9T- is indeed a declared label or not. If the toen

    following 9T- is not a identifier or it was not declared as a label before!

    error messages will be printed.

      6et us consider another case. Imagine that we have already recogniDed

    the first toen to be the reserved word SF$2I$6#. We then chec 

    whether the ne"t toen is SI3T#9#2! S2#$6 or S--6#$3 If yes!

    we go ahead to chec whether S: follows. If no! an error message is

     printed. $fter finding S:! we e"pect a sequence of distinct identifiers!

    none of which appeared before and each one is followed by S!. +inally!

    we e"pect a S to terminate this instruction. +or each identifier appearing

    in this instruction! we have to tae some semantic action which will be

    e"plained in detail later. If the variables are not distinct! or some variable

    is not followed by S! or some variable appeared before! error messages

    will be printed. The entire process of analyDing a variable type

    declaration instruction can be illustrated in +ig 0.1.

    &

  • 8/9/2019 Francis Compiler

    30/103

    +I9. 0.1

    L

  • 8/9/2019 Francis Compiler

    31/103

      The reader can see that it is absolutely necessary to recogniDe the

    type of an instruction being analyDed. 6et us temporarily assume that

    none of our instructions starts with a label (We shall discuss the handling

    of labeled instructions later.).If this is the case! every instruction in+2$35I*! e"cept the assignment instructions! starts with a reserved

    word. +or instance! the program heading instruction starts with the

    reserved word 42-92$' and the I+ T;#3 #6*# instructions start with

    the reserved word I+. In the following table! Table 0.1 we show how

    every instruction! e"cept assignment instruction! starts with a particular

    reserved word. If an instruction does not start with any reserved word!

    then it must be an assignment instruction.

      Instructions 2eserved Word

    4rogram heading instructions 42-92$'

    *ubroutine heading instructions *82-8TI3#

    Fariable type declaration instructions F$2I$6#

    6abel declaration instructions 6$#6

    7imension declaration instructions 7I'#3*I-3

    I+ T;#3 #6*# instructions I+

    9- T- instructions 9T-

    $ssignment instructions """

    5all instructions 5$66

    IP- instructions read I348T

    IP- instructions went -8T48T

    Table 0.1.

    1

  • 8/9/2019 Francis Compiler

    32/103

      $gain! we lie to point out here that in +-2T2$3! we can not

    identify the type of an instruction by checing the first toen! because

    +-2T2$3 has no reserved words. ;owever! those who are familiar with

    5--6 or $*I5 will be able to note that every 5--6 or $*I5instruction begins with a reserved word and this reserved word uniquely

    determines the type of the instruction.

      2emember that in our compiler! every toen is characteriDed by two

    integers. To chec whether a toen is a reserved word or not! we merely

    have to chec whether the first integer is & because Table & contains all of 

    the reserved words.

      We assumed that labels did not e"ist in the previous discussions. This

    is! of course! not a valid assumption because we do allow labels to appear 

    in instructions. $ctually! given an instruction! our first ,ob is to chec 

    whether the beginning toen is a label or not.

      3ote that labels were not detected in the le"ical analysis phase they

    were merely identified as identifiers. 3evertheless! in +2$35I*! it is

    agreed that a label must have been declared before it is used. In other 

    words! there must be an instruction I declaring label 6 before it is used.

    $fter instruction I is processed by the synta" analyDer! as the reader will

    see later! the synta" analyDer will put a note in the Identifier Table to

    declare that I identifier 6 is a label. This is done by putting an appropriate

    entry in the type entry of the Identifier Table. 6ater! whenever we want to

    now whether any toen is a label or not! we merely have to chec this

    toen in the Identifier Table. If it is indeed a label! the type entry of this

    toen in the Identifier Table will declare this fact.

    &

  • 8/9/2019 Francis Compiler

    33/103

      $t the top of the synta" analyDer! the type of each instruction is

    determined as follows:

      (1)5hec the first toen of an instruction. If this is a reserved word! the

    type of this instruction is determined by this reserved word.

      (&)If the first toen is not a reserved word! chec whether this toen is

    a label. If this is not a label! this instruction must be an assignment

    instruction.

      ()If the first toen is a label! the type of this instruction is determined

     by the second toen. If the second toen is a reserved word! then the type

    of this instruction is determined by this toen. -therwise! it must be an

    assignment instruction.

    *ection 0.&. The 4rocessing of the 4rogram ;eading Instruction.

    There can be only one program heading instruction in the entire

     program and this instruction must appear as the first instruction. The

    reserved word corresponding to this instruction is 42-92$'. $fter this

    instruction is processed! the heading of this program will now be

    classified as an identifier and the into the Identifier Table. In this

    Identifier Table! we do the following:

  • 8/9/2019 Francis Compiler

    34/103

    (1)In the subroutine entry of the 4rogram heading in the Identifier Table!

    we have nothing. 6ater! whenever we notice that nothing appears in the

    subroutine entry of an identifier table! we now that this identifier is the

    name of a subroutine. 3ote that the main program can also be consideredas a subroutine.

      (&)In the pointer entry! we put a S1 there. We do this to indicate that

    the quadruples corresponding to this program start from the first location

    of the Uuadruple Table. The Uuadruple Table is now illustrated in +ig.

    0.&. +or 

    4ointer

    *1 V

    ' p l

    * 6

    *&  

      Identifier Table Uuadruple Table

    +ig. 0.&

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    'ain 4rogram ' p

    *ubroutine *1

    *ubroutine *&

    *ubroutine *

    l

    V

     

    6

    0

  • 8/9/2019 Francis Compiler

    35/103

    every subroutine! a sequence of quadruples are generated and stored

    consecutively in the Uuadruple Table. In the Identifier Table! we have a

     pointer for every subroutine pointing to the starting point of the

    corresponding sequence of quadruples.

    *ection 0.. The 4rocessing of variable Type 7eclaration Instructions.

    The reserved word to identify a variable type declaration instruction

    is F$2I$6#. $fter processing such an instruction! the following actions

    will be taen:

      (1)6et G1,X &,……,Gn be variables declared in this instruction. In

    the subroutine entry of Gi in the Identifier Table! put a pointer pointing to

    the subroutine it appears in. That is! suppose G i belongs to *ubroutine *

    which appears in 6ocation 6s in the Identifier Table! then put 6s  in the

    subroutine entry corresponding to Gi in the Identifier Table.

    (&) 4ut a code indicating the type of Gi in the type entry of Gi in the

    Identifier Table. We may arbitrarily set the codes as follows:

    $rray 1

    oolean &

    5haracter

    Integer 0

    6abel /

    2eal =

    /

  • 8/9/2019 Francis Compiler

    36/103

    If the synta" analyDer finds out that the type of Gi is boolean! S& will be

     put into the type entry of Gi in the Identifier Table.

      ()+or each variable Gi! a quadruple is generated in the Uuadruple

    Table. This quadruple will be used by the code generator later to assign

    the necessary space. 6et the location in the Identifier Table occupied by

    Gi be 6i. Then the quadruple corresponding to Gi is ((/!6i)! ! ! ).

    6et us imagine that a certain variable G is declared as a real variable

    and the location occupied by this variable is 1/ in the Identifier Table.

    Then the quadruple corresponding to this variable is ((/!1/)! ! ! !). In the

    type entsy! we have a S=! indicating that it is a real variable. The code

    generator! after noting the &%tuple (/!1/)! will e"amine the 1/th location

    of the Identifier Table. It will notice the e"istence of S= in the type entry

    and thus will generate the following assembly language codes:

    G 7* &

     because a real number variable will occupy two memory locations.

    *ection 0.0. The 4rocessing of 7imension Instructions.

    =

  • 8/9/2019 Francis Compiler

    37/103

    $ dimension declaration instruction declares arrays. +or every array

    declared! we have to record the type of the array! the dimensionality of 

    the array and the siDe of the array in each dimension. +or instance!

    suppose the siDe of the array in each dimension. +or instance! suppose the

    siDe of a real array is declared as (1=!/). We now have to record the fact

    that the array is real! the dimensionality of this array is &! the siDe of the

    first dimension is 1=! and the siDe of the second dimension is /. To store

    this information! we need a new table! called Information Table. We shall

    denote is Table E. Table E is a one%dimensional table . In the above case!

    the information will be stored as follows:

    .

    .=

    &

    1=

    /

    .

    .

    .

    Information Table

    In the Identifier Table! a pointer will be made to point to the Information

    Table as follows:

    X2eal $rray

    X7imensionality

    XThe siDe of the first dimension

    XThe siDe of the second dimension

    E

  • 8/9/2019 Francis Compiler

    38/103

    *ubroutine

    $ 1 &1

    Identifier Table

    6et us summariDe the actions taen by the synta" analyDer to handle a

    dimension declaration instruction as follows:

    (1)*uppose $ is declared as an array. In the subroutine entry of $ in

    the Identifier Table! 4ut an appropriate pointer to indicate the subroutine

    it belongs to.

    (&)In the type entry of $ in the Identifier Table! put S1 into it!

    indicating that $ is an array.

    ()*uppose that the type code of an array is ,! the dimensionality of 

    $ is i and the siDes of $ corresponding to different dimensions of $ are

    *1, *& , *i respectively. 4ut *1, *&, *&,… ...*i  into the

    Informations Table (Table E). *uppose! in the Information Table ! the ne"t

    available location is 3! then the information ,. i,*1,*&,….. ,*i will

     be put into locations 3! 3K1! . ! 3KiKl respectively. 4ut 3 into the

     pointer entry of $ in the Identifier Table.

      (0)In the Uuadruple Table ! generate a quadruple in which the first

    element identifies the location occupied by $rray $ in the Identifier 

    Information Table

    &1

    &&

    &

    &0

    J

  • 8/9/2019 Francis Compiler

    39/103

    Table.

  • 8/9/2019 Francis Compiler

    40/103

    That is ! if the array occupies the /th location of the Identifier Table!

    then the quadruple generated will be

    ((/!/)! ! ! )

    The information Table will not only contain information concerning

    arrays. $s we shall see later !when we want to store other types of 

    information! such as the information concerning parameters of a

    subroutine! we can also use this table. We may view the Information

    Table as a warehouse where many people can store things. The only thing

    that we have to do is to eep pointers pointing to the location where

    information is stored.

    The reader may wonder why we donRt eep separate tables for 

    different types of information. +or instance! why donRt we eep on table

    to store information concerning arrays and another table for information

    concerning subroutinesM 3ote that if we eep several tables! we have to

    reserve quite a lot of memory locations for these tables. That is! we have

    to eep one array for each table. *ince we do not now how large the siDe

    of each table should be! we are forced to eep many rather large arrays

    and later we may find out that many arrays are largely empty and a lot of 

    memory space is wasted. y eeping one information table! we have

    decreased the degree of memory space wasting.

    0L

  • 8/9/2019 Francis Compiler

    41/103

      There is another important point to be made here. *uppose we write the

    compiler in +-2T2$3. We certainly can not have an array which will

    store integers as well as characters. The Information Table! Table E!

    contains only integers. The this is possible is due to the clever design tomae sure that every piece of information is represented by integers.

    *ection 0./ The 4rocessing of 6abel Instructions.

      6abel instructions are easy to identify they all start with the reserved

    word 6$#6. $fter recogniDing a label instruction! the following actions

    will be taen.

      (1)$rrange a pointer to be put into the subroutine entry of this label in

    the Identifier Table.

      (&)In the type entry! put

  • 8/9/2019 Francis Compiler

    42/103

      .

    .

    .

    6$#6 61LL

    .

    .

    .

      61LL G>KH

      In this case! the following events must tae place:

      (1)The le"ical analyDer recogniDes 61LL as an identifier and hashes it

    into the Identifier Table.

      (&)The synta" analyDer recogniDes the reserved word 6$#6 and

    consequently decides that Identifier 61LL must be a label. It then

    indicates so by putting

  • 8/9/2019 Francis Compiler

    43/103

    6ater! when the code generator encounters this quadruple! it will realiDe

    that the ne"t assembly language codes ne"t generated should be labeled.

    +or 

    G>KHthe assembly language codes are

    67

    $7 H

    *T G

    ecause this instructions is labeled! the code generator will generate a

    special label! say 6! in this case. Thus we may have

    6 67   $7 H

      *T G

    Imagine that we later encounter an instruction as

    9T- 61LL

    In this case! the synta" analyDer checs the pointer entry of 61LL in the

    Identifier Table and notices that the starting quadruple corresponding to

    the instruction labeled by 61LL starts from location 3 in the Uuadruple

    Table. It will therefore generate a quadruple e"pressing

  • 8/9/2019 Francis Compiler

    44/103

    We hope that the reader can appreciate the rule that we do not use

    integers to represent labels! as is done in many languages. In +2$35I*!

    every integer recogniDed in the le"ical analysis phase is indeed a number 

    and it is therefore appropriate to put it into the Integer Table. If we also

    used numbers to represent labels! then we have to do some synta"

    analysis even in the le"ical analyDer which means that the le"ical

    analyDer and the synta" analyDer can not be separated clearly.

    *ection 0.E The 4rocessing of $ssignment Instructions.

      We are now going to discuss the most interesting and important part of 

    the synta" analyDer: the processing of assignment instructions. $s we

    noted before! assignment instructions are the only instruction which do

    not start with a reserved word and must contain an < sign.

      In the following discussion! we shall first assume that the assignment

    instructions contain arithmetic e"pressions only. The method that we

     present can be easily e"tended to handle assignment instructions with

    logical e"pressions. esides! let us temporarily assume that arrays do not

    appear in the assignment instructions.

    5onsider the instruction

    G>KH

    00

  • 8/9/2019 Francis Compiler

    45/103

    It is very easy for the synta" analyDer to recogniDe the operator

  • 8/9/2019 Francis Compiler

    46/103

    (*,U,V,T1)

    (K!!T1!G)

    where T1 is a temporary variable.

      -ur question is :;ow does the computer now that we should first

    e"ecute multiplication and then addition?

      This problem can be solved by converting an assignment instruction

    into its 2everse 4olish 3otation. To obtain the 2everse 4olish 3otation

    of an assignment instruction involving arithmetic operations only! we

    shall first assume that there is an ordering of operators as follows:

    O!P

    K!%

    ( !)

    >

    The procedure to generate the 2everse 4olish 3otation for an assignment

    instruction is as follows:

      (1)The symbols of the source string move left towards the ,unction(*ee

    +ig. 0.).

      (&)When an operand (In our case! an operand is a variable.) arrives! it

     passes straight through the ,unction.

      ()When an operator (In our case! an operator is a delimiter) arrives! it

     pauses at the ,unction. If the stac is empty! this incoming operator slides

    down into the stac.

    0=

  • 8/9/2019 Francis Compiler

    47/103

    -therwise! it loos at the top of the stac. While the priority of the

    incoming operator is not greater than that of the operator at the top of the

    stac! the operators in the stac will pop out and move to the left of the

     ,unction one by one. $fter the above process is finished! the incomingoperator is pushed into the stac.

      (0)(always goes down to the stac.

      (/)When) reaches the ,unction! it causes all of the operators in the stac 

     bac as far as (to be shunted out and then both parentheses disappear.

      (=)When all symbols have been moved to the left. pop out all of the

    operators in the stac to the left ,unction.

    6et us consider the case of

    G>K8OF

    The steps of obtaining the 2everse 4olish 3otation of the above

    instruction is illustrated in +ig. 0..

      G>K8OF G! > K8OF

      (*T$5)

      (a) (b)

    G! K8OF G! K 8OF

      > >

      (c) (d)

    0E

  • 8/9/2019 Francis Compiler

    48/103

    G! 8OF G!!8 O F

      K K

     > >

    (e) (f)

    G!!8! F G!!8!F

    O O

    K K

    > >

    (g) (h)

    G!!8!F!O!K! G!!8!F!O!K!>

    >

    (i) (,)

    +ig. 0.

    The transformed 2everse 4olish 3otation is thus

    G!!8!F!O!K!>.

    $fter transforming an assignment instruction into its 2everse 4olish

     3otation! one can decompose it into a sequence of basic instructions very

    easily. The procedure is as follows: (We assume that every operator is a

     binary operator here. It is easy to e"tend this method to handle unary

    operators.)

    *tep 1. 6et i>1.

    *tep &. 4ic the first operator from left to right. 6et this operator be Oi .

    0J

  • 8/9/2019 Francis Compiler

    49/103

    *tep . *can bac and pic up two operands immediately preceding Oi   .

    6et these two operands be F1   and F2 . 9enerate a basic

    instruction Oi   iT V V    =)!( &1 .

    *tep 0. If the last symbol has already been scanned! stop. -therwise! let

    i>iK1 and go to *tep &.

    6et us consider the assignment instruction.

    G>K8OF

    again. The 2everse 4olish 3otation of the above instruction is

    G!!8!F!O!K!>.

    $ sequence of basic instructions are now generated:

    asic Instruction Uuadruple

    G!!8!F!O!K!> T1>8OF (O!8!F!T1)

    G!!T1!K!> T&>KT1 (K!!T1!T&)

    G!T&!> G>T& (>!T&! !G)

      6et us consider another e"ample:

    G>(K8)OF

    The 2everse 4olish 3otation of the above instruction is obtain as follows:

    G>(K8)OF G! >(K8)OF

    (a) (b)

    0

  • 8/9/2019 Francis Compiler

    50/103

    G! (K8)OF G! ( K8)OF

    > >

    (c) (d)

    G! K8)OF G! K 8)OF

    ( (

    > >

    (e) (f)

    G!! 8)OF G!!8! ) OF

    K K

    ( (

    > >

      (g) (h)

    G!!8!K! OF G!!8!K O F

    > >

    (i) (,)

    G!!8!K F *!G!8!K!F

    O O

    > >

    () (l)

    G!!8!K!F!O G!!8!K!F!O!>

    >

    (m) (n)

    +ig. 0.0

    /L

  • 8/9/2019 Francis Compiler

    51/103

    The 2everse 4olish 3otation for 

    G>(K8)OF

    is thus

    G!!8!K!F!O!>

    $ sequence of basic instructions are generated as follows:

    asic Instructions Uuadruples

    G!!8!K!F!O!> T1>K8 (K!!8!T1)

    G!T1!F!O!> T&>T1OF (O!!T1!F!T&)

    G!T&!> G>T& (>!T&! !G)

      In the previous discussion! we have only dealt with assignment

    instructions with arithmetic operations. In +2$35I*! we allow logical

    assignments as well. That is! we may have an instruction as

    G>4 $37 U -2 2

    where 4!U and 2 are boolean variables. It should be easy for the reader to

    see that the above instruction can be transformed into its 2everse 4olish

     3otation as follows:

    G!4!U! $37! 2! -2!>

    if we agree that $37 should be performed before -2. The basic

    instructions are generated as below:

    /1

  • 8/9/2019 Francis Compiler

    52/103

    asic Instructions Uuadruples

    G!4!U!$37!2!-2!> T1>4 $37 U ($37!4!U!T1)

    G!T1!2!-2!> T&>T1 -2 2 (-2!T1!2!T&)

    G!T&!> G>T& (>!T&! !G)

    *ection 0.J The 4rocessing of Instructions 5ontaining $rrays.

      In the previous discussions! we assumed that arrays did not e"ist. If 

    they do e"ist. If they do e"ist! we have to tae special action. 5onsider 

    the following instruction:

    $(I)>(&!)K0

    In this case! the parameters appear because of the arrays $ and and they

    must be eliminated first.

      3ote that arrays must have been declared before they are used. 6et us

    assume that $ and appear in the Eth and the 1=th locations in the

    Identifier Table respectively. Then $ is represented by (/!E) and is

    represented by (/!1L). In the Identifier Table! both of these identifiers

    will be indicated to be arrays. The synta" analyDer! when it encounters

    the symbol (/!E)! will now e"amine the Eth location of Table / and will

    find out that this identifier represents a array. Immediately! it will branch

    to a subroutine to handle this array.

    6et us consider a simple e"ample first:

    G>(I!V)K0

    /&

  • 8/9/2019 Francis Compiler

    53/103

      In this case! we want to replace (I!V) by a temporary variable. 6et us

    assume that $rray was defined as follows:

    7I'#3*I-3 I3T#9#2:(!)

    The traditional method of memory management will arrange the array as

    follows:

    (1!1)

    (&!1)

    (!1)

    (1!&)

    (&!&)

    (!&)

    (1!)(&!)

    (!)

    That is ! (I!V) will be the ((V%1)'KI)th location of the array if the

    dimensionalities of are ' and 3 respectively. Thus! to replace (I!V)!

    we have to generate the following quadruples:

    (%!V!1!T1) T1>V%1

    (O!T1!'!T&) T&>T1O'

    (K!T&!I!T) T>T&KI

    +inally! we need the following quadruple:

    (>!!T!T0)!

    /

  • 8/9/2019 Francis Compiler

    54/103

    which is interpreted as

    T0>(T).

    The assembly language codes corresponding to

    (>!!T!T0)

    are

    67 KT%1

    *T T0

    The entire sequence of quadruples for 

    G>(I!V)K0

    are(%!V!1!T1) T1>V%1

    (O!T1!'!T&) T&>T1O'

    (K!T&!I!T) T>T&KI

    (>!!T!T0) T0>(T)

    (K!T0!0!T/) T/>T0K0

    (>!T/! !G) G>T/

      6et us consider an e"ample in which an array appears at the left side of 

    the S> sign:

    $(I)>(I!V)

    In this case! (I!V) is replaced by a temporary variable the same way as

    we discussed before. *uppose the temporary variable is T0. We then

    have the situation as follows:

    $(I)>T0

    /0

  • 8/9/2019 Francis Compiler

    55/103

      The reader can see that we have three inds of quadruples where the

    operators are all S>:

    (>! !G) G>

    (>!$!I!G) G>$(I)(>!G!$!I) $(I)>G.

      That the 5ode 9enerator will not be confused is due to the fact that $ is

    a declared array while G and are not.

      We discussed how to handle the &%dimensional integer arrays. It

    should be easy for the reader to e"tend the above idea to three! or four 

    dimensional arrays. It should not be difficult to handle real arrays either.

    *ection 0.. The 4rocessing of I+ T;#3 #6*# Instructions.

    The I+ T;#3 #6*# instructions can be easily recogniDed because they

    all start with the reserved word I+. $ general format of this ind of 

    instruction is

    I+ # T;#3 I31  #6*# I3 2

    where # is a logical e"pression and I31   and I3 2   are two simple

    instructions.

    The synta" analyDer! after analyDing the above instruction! would

    generate the following quadruple:

    (I+!4!U41 !U42 )

    //

  • 8/9/2019 Francis Compiler

    56/103

      where 4 is a logical variable! evaluated to 1 or L! depending on the

    e"pression # and the values of its variable during the run time and U41  is

    the starting quadruple corresponding to I31 . This quadruple will be

    interpreted as SIf 4 is true! e"ecute the codes corresponding to U41

    otherwise! e"ecute codes corresponding to U42 . *uppose the quadruple

    corresponding to I31  starts from the 1=th location of the Uuadruple Table

    (Table=)! then U41  will be represented by (=!1=).

      6et us illustrate these ideas by an e"ample. 5onsider 

      I+ 4 T;#3 G>GK1 #6*# G>GK&

      The synta" analyDer will generate the following quadruples! assuming

    that the quadruples will start from the 1/th location of the Uuadruple

    Table.

    1/ (I+!4!(=!1=)!(=!1))

    1= (K!G!1!T1)

    G>GK1

    1E (>!T1! !G)

    1J (9T-! ! !(=!&1))

    1 (K!G!&!T&)

      G>GK&

    &L (>!T&! !G)

    When the first quadruple of the above sequence is generated! the synta"

    analyDer nows that U41  will start from the 1=th location! but it does not

    /=

  • 8/9/2019 Francis Compiler

    57/103

    now where U42  will start. Therefore! the synta" analyDer will have to

    wait for the generation of the 1th quadruple. When this quadruple is

    generated! the synta" analyDer comes bac to Uuadruple 1/ and fills in

    (=!1). The situation is the same for Uuadruple 1J. When this quadruple

    is generated! the synta" analyDer does not now how long the sequence of 

    quadruples corresponding to G>GK& will be. It therefore waits until all

    of the quadruples corresponding to G>GK& are generated.

      6et us consider a more complicated case:

    I+ 4 $37 U T;#3 G>GK1 #6*# G>GK&

      In this case! we first have to evaluate 4 and U! assuming that 4 and U

    are boolean variables. The quadruples for this instruction with be as

    follows:

    1/ ($37!4!U!T1Y  1= (I+!T1!(=!1E)!(=!&L))

    1E (K!G!1!T&)

    1J (>!T& !G)

    1 (9T-! ! !(=!&&))

    &L (K!G!&!T)

    &1 (>!T! !G)

    Uuadruple 1/ means that the value of T1 is the $37 of 4 and U. the

    assembly language codes for this quadruple are:

    1!7 4

    $37 U

    *T T

    /E

  • 8/9/2019 Francis Compiler

    58/103

      In general, a logical expression can be as coplica!e" as #ollo$s%

    G 9T -2 6T H $37 8 #U F.

    To evaluate the above e"pression! we may use the 2everse 4olish

     3otation to transform it into the following form (We have assumed that

    we should evaluate $37 operations before -2 operations.):

    G!!9T!!H!6T!8!F!#U!$37!-2 

    The sequence of quadruples generated will be

    (9T!G!!T1)

    (6T!!H!T&)

    (#U!8!F!T)($37!T&!T!T0)

    (-2!T1!T0!T/)

    *ection 0.1L. The 4rocessing of 9o T- Instructions.

      It is very simple to process 9- T- instructions. +or9T- 6G

    the synta" analyDer first has to find out whether 6G is a label or not. This

    can be done by checing through the Identifier Table. If it has not been

    declared as a label or it was declared to be something else! we produce an

    error message. If 6G is indeed a label! we chec whether this instruction

    with the label has appeared before. If it appeared! we may generate the

    following quadruple:

    (9T-! ! !U 4G )

    where U4G   is the starting quadruple corresponding to the instruction

    labeled with 6G. -therwise! we have to wait until the synta" analyDer 

    encounters the instruction with the label. If this instruction never appears

    /J

  • 8/9/2019 Francis Compiler

    59/103

    or it appears more than once! we also produce error messages.

    *ection 0.11 The 4rocessing of 5$66 Instructions.

      When a 5$66 instruction is encountered! the synta" analyDer will have

    to record the name of the subroutine being called and the arguments

    appearing in the instruction. To do this! we again use the Information

    Table.

      $ 5$66 instruction is of the following form:

    5$66 *1( 4 4 4 31 2! !....! )

    The quadruple corresponding to the above instruction will be of the

    following form:

    /

  • 8/9/2019 Francis Compiler

    60/103

    .

    .

    .

    .(5$66! *1! !(E!')) ' 3 The number of parameters

    4ointer pointing to 41

    . .

    . .

    . .

    .

    4ointer pointing to 4 3

    .

    .

    Information Table (Table E)

      *ince we store identifiers and numbers in tables! we have to use two

    numbers to represent each 4i . 6et us assume that our 5$66 instruction is

    5$66 *1(W!1=!$!/E!)

    *uppose that *1! W and $ occupy the =th! 1/th and &Eth locations of the

    Identifier Table (Table /) respectively. *uppose 1= occupies the rd

    location of the Integer Table (Table)! and /E. occupies the &nd location

    of the 2eal Table (Table 0). 3ote that the reserved word 5$66 occupies

    the rd location of the 2eserved Word Table (Table &). 6et us assume

    that the ne"t available location in the Information Table is /. The

    quadruple corresponding to the above instruction will be as follows:

    =L

  • 8/9/2019 Francis Compiler

    61/103

    .

    .

    .

    .((&!)(/!=)! !(E!/)) / 0 3umber of parameters/ W1/

    1=/&E $

    0& /E.

    Table E

    *ection 0.1&. #"amples.

      In this section we shall present two e"amples designed to show how

    quadruples are generated and how the various tables will loo after the

    synta" analyDer is e"ecuted.

    #"ample 0.1

    42-92$' $1

    F$2I$6# I3T#9#2:G!!I

    7I'#3*I-3 I3T#9#2:$(1&)

    6$#6 61! 6&

    I>1

    G>/>11

    61 I+ (G 9T ) T;#3 (9T- 6&) #6*# G>GK&

    $(I)>G

    I>IK1

    9T- 61

    6& #34

    =1

  • 8/9/2019 Francis Compiler

    62/103

    *82-8TI3# T4# 4-I3T#2 

    1

    & I / 0

    0

    / $1 1 (Uuadruple Table)

    =

    E $ / 1 1 (Information Table)

    J G / 0

    1L

    11 / 0

    1&

    1

    10 61 / / 1L (Uuadruple Table)

    1/ 6& / / 1J (Uuadruple Table)

    Identifier Table (Table/)

    1 1& 1 0

    & 1 & 1 $rray $

    / 1&

    0 11

    / & Information Table (Table E)

    Integer Table (Table)

    =&

  • 8/9/2019 Francis Compiler

    63/103

    1 ((/!J)! ! ! ) G

    & ((/!11)! ! ! )

    ((/!&)! ! ! ) I

    0 ((/!E)! ! ! ) $/ ((/!10)! ! ! ) 61

    = ((/!1/)! ! ! ) 6&

    E ((1!0)!(!&)! !(/!&)) I>1

    J ((1!0)!(!)! !(/!&)) G>/

    ((1!0)!(!0)! !(/!J)) >11

    1L ((&!1L)!(/!J)!(/!11)!(L!1)) T1>G 9T

    11 ((&!1&)!(L!1)!(=!1&)!(=!1)) I+ T1 9- T- 1&! #6*# 9- T- 1

    1& ((&!11)! ! !(=!1J)) 9T- 6&

    1 ((1!/)!(/!J)!(!/)!(L!&)) T&>GK&

    10 ((1!0)!(L!&)! !(/!J)) G>T&

    1/ ((1!0)!(/!J)!(/!E)!(/!&)) $(I)>G

    1= ((1!/)!(/!&)!(!&)!(/!&)) I>IK1

    1E ((&!11)! ! !(=!1L)) 9T- 61

    1J ((&!=)! ! ! ) 6"

    Uuadruple Table (Table =)

    #"ample 0.&.

    42-92$' $&

    F$2I$6# I3T#9#2: I!V!7I'#3*I-3 I3T#9#2: $(&L)!(0!/)

    I>&

    V>

    5$66 $(I!V!)

    $()>(I!V)K&.E

    #34

    *82-8TI3# $(I3T#9#2:G!!)

    F$2I$6# I3T#9#2:H

    H>=>(G%H)O&K

    #3*

    =

  • 8/9/2019 Francis Compiler

    64/103

    *ubroutine Type 4ointer Type 4ointer 

    1

    & G 11 0

    I = 00

    / $ = 1 1 (Information Table)

    = $& 1 (Uuadruple Table)

    E = 0

    J 11 0

    = 1 0 (Information Table)

    1L V = 0

    11 $ 1= (Uuadruple Table)

    1&1 11 0

    10

    1/ H 11 0

    1=

    Identifier Table (Table /)

    1 0

    & 1 $rray $

    &L

    0 0/ & $rray 1 &L 1 &.E

    = 0 & 0

    E / /

    J 0 & 2eal Table

    / I / (Table 0)

    1L = =

    11 / V $(I!V!) E 1

    1& 1L

    1 / Integer Table

    10 E (Table )

    Information Table

    (Table E)

    =0

  • 8/9/2019 Francis Compiler

    65/103

    Uuadruples:

    1 ((/!)! ! ! ) I

    & ((/!1L)! ! ! ) V

    ((/!E)! ! ! ! )  

    0 ((/!/)! ! ! ! ) $

    / ((/!)! ! ! ! )

    = ((1!0)!(!0)! !(/!)) I>&

    E ((1!0)!(!/)! !(/!1L)) V> 4rogram $&

    J ((&!)!(/!11)!!(E!J)) 5$66 $(I!V!)

    ((1!=)!(/!1L)!(!E)!(L!1)) T1>V%1

    1L ((1!E)!(L!1)!(!&)!(L!&)) T&>T1O0

    11 ((1!/)!(/!)!(L!&)!(L!)) T>IKT&

    1& ((1!0)!(/!)!(L!)!(L!0)) T0>(T)

    1 ((1!/)!(L!0)!(0!1)!(L!/)) T/>T0K&.E

    10 ((1!0)!(L!/)!(/!/)!(/!E)) $()>T/

    1/ ((&!=)! ! ! ) #34

    1= ((/!&)! ! ! ) G

    1E ((/!1)! ! ! )

    1J ((/!J)! ! ! )  

    1 ((/!1/)! ! ! ) H

    &L ((1!0)!(!=)! !(/!1/)) H>=

    &1 ((1!=)!(/!&)!(/!1/)!(L!=)) T=>G%H *ubroutine $

    && ((1!)!(L!=)!(!0)!(L!E)) TE>T=K&

    & ((1!/)!(L!E)!(/!1)!(L!J)) TJ>TEK

    &0 ((1!0)!(L!J)! !(/!J)) >TJ

    &/ ((&!E)! ! ! ) #3*

    =/

  • 8/9/2019 Francis Compiler

    66/103

    *ection /. 5ode 9eneration

      The function of a code generator of a compiler is to generate assembly

    or machine language code. 5onceptually! this is the easiest phase of the

    compiler writing. In practice! it is always quite difficult to write because

    it is heavily machine%dependent. -ne must be able to master the target

    assembly language! the operating system! the linage editor! the file

    system and many other features of the target machine. In order to

    simplify our discussion! we assume that we are implementing our 

    compiler on a very simple machine whose assembly language instructions

    are shown below:

    $77 add

    $37 and operation

    $29 argument

    3 branch if negative

    4 branch if positive

    H branch if Dero

    75 define constant

    7IF divide

    7-* define storage

    #G4 e"ponential

    #37 end

    I35 increase by one

    V'4 unconditional ,ump

    ==

  • 8/9/2019 Francis Compiler

    67/103

    &'B p !o sbro!ine

    67 load

    67I load%indirect

    '4 multiply

     3-4 no operation

    -2 or operation

    2T3 return

    *T store

    *TI store%indirect

    *8 subtract

    GI input

    G-2 e"clusive of operation

    G4 output

      We shall first assume that our language is non%recursive.

      The code generation of recursive language will be discussed later. 6et

    us consider #"ample 0.&. the code generator will first e"amine Table

    and 0 and generate the following instructions:

    I1 75 &L

    I& 75 0

    I 75 /

    I0 75 &

    I/ 75

    I= 75 =

    IE 75 1

    21 75 &.E

    =E

  • 8/9/2019 Francis Compiler

    68/103

      Then the code generator will e"amine the quadruples one by one and

    generate a sequence of instructions for each quadruple. +or instance!

    consider the first quadruple. This quadruple is

    ((/!)! ! ! )$fter e"amining this quadruple! the code generator will find out that this

    quadruple calls for the following instruction:

    I 7* 1

     because the third element in Table / corresponds to an identifier I which

    is an integer occupying one word of memory space.

    5onsider Uuadruple 0. This quadruple is

    ((/!/)! ! ! )

    We shall first find out that (/!/) corresponds to an array! as indicated by

    its type in Table /. The other characteristics of this array can be found by

    its pointer pointing to Table E. In Table E! we will note that $rray $ is a

    one%dimensional array and occupies twenty memory locations. Thus! the

    code generator will generate the following assembly language instruction:

    $ 7* &L

    *imilarly! the assembly language instruction for 

    =J

  • 8/9/2019 Francis Compiler

    69/103

    Uuadruple

    ((/!)! ! ! )

    will be

    7* &L

    5onsider Uuadruple =! which is

    ((1!0)!(!0)! !(/!)).

    (1!0) will be found to correspond to S>! (!0) will be found to correspond

    to S& and (/!) will be found to correspond to SI. Thus this quadruple

    will be translated into the following assembly language instructions:

    67 I0

    *T I

    similarly! Uuadruple 11 will be translated into

    67 I

    $77 T&

    *T T

    =

  • 8/9/2019 Francis Compiler

    70/103

    *ection /.1 The 5ode 9eneration for *ubroutine 5alls

      6et us consider Uuadruple J of #"ample 0.&:

    ((&!)!(/!11)! !(E!J)).

    The first &%tuple corresponds to 5$66 and (/!11) corresponds to $.

    Thus the first assembly language instruction corresponding to the above

    quadruple will be

    V* $.

    $ subroutine call will usually invoe the passing of arguments. The

    arguments of this ,ump%to%subroutine instruction can be found by

    e"amining the last element of the quadruple. *ince the last elements is

    (E!J)! we e"amine the Jth element of Table E. This will inform us that

    this calling instruction involves three parameters! namely I! V! and .

      There are many methods of passing the arguments. 6et us introduce

    the simplest ind first.

    'ethod 1%%5all by $ddress

    In this method! after the

    V* $

    instruction! we shall have a sequence of instructions to store the addresses

    of arguments which are to be passed. Thus the instructions will be as

    follows:

    EL

  • 8/9/2019 Francis Compiler

    71/103

    &'B A

    $29 I

    $29 V

    $29  

    $fter these instructions! an instruction corresponding to

    T1>V%1

    will follow. That is! we shall have a sequence of assembly language

    instructions as follows:

      V* $

    61 $29 I

    6& $29 V6 $29  

    60 67 V

    *8 -3#

    *T T1

    When the program is finally e"ecuted! the addresses of I!V and will be

    stored in 61! 6& and 6 respectively.

      6et us now tae a loo as to how the arguments are passed to the

    called subroutine $! in this case. $t the very beginning of the

    instructions corresponding to $! we shall have a define%storage

    instruction

    $ 7* 1

    When the calling instruction

    E1

  • 8/9/2019 Francis Compiler

    72/103

    &'B A

    is e"ecuted! the machine will store the address immediately after the V*

    instruction! namely 61! into the location $. This action serves to build a

    lin between a calling instruction and the called subroutine. $s will

     become clear later! the arguments as well as the return address are all

     passed through this mechanism.

      3ote that in #"ample 0.&! we have to pass parameters I!V! and to G!

    and (These three variables are local ones.) respectively. To pass I to G!

    we may use the following instructions:

    67I $

    *T G

      The load%indirect instruction stores 61 into the register. *ince the

    address of I is stored in 61! the store instruction puts the address of I into

    location G.

      The other parameters can be transferred in a similar way +or instance!

    the transferring of V to will be as follows:

    I35 $

    67I $

    *T

    To transfer the last argument! we do the following:

    I35 $

    67I $

    *T  

    E&

  • 8/9/2019 Francis Compiler

    73/103

      $t the end of the code corresponding to the subroutine $! there should

     be a return instruction:

    I35 $

    2T3 $

     3ote that the final $ point to 60 and control will eventually return to 60!

    as e"pected.

      It is important to note that for every parameter inside a subroutine!

    which is not an ordinary variable! its address! not its value! is stored.

    Thus every time that the subroutine refers to a parameter! it should use an

    indirect command so that the command will actually be referring to the

    variable in the calling procedure. +or e"ample! if is a parameter! then

    the statement

    >K1

    is translated into

    67I

    $77 -3#

    *TI

    Through this way! the argument is changed in the calling program

    which is desired.

    E

  • 8/9/2019 Francis Compiler

    74/103

    'ethod &.%%5all by Falue

      The above method of passing parameters during subroutine calls is

    called Scall by address because the address of a variable is used. *ince

    the address is passed! it may sometimes cause trouble. +or instance! let

    us consider the calling of $$771(G) again.

    *82-8TI3# $771(I:Integer)

    I>IK1

    2#T823

    #37

    There is no problem if we have the following instructions:

    G>&

    5$66 $771(G)

    -8T48T G

    In this case! S will be printed out.

    *uppose we have the following program:

    5$66 $771(&)

    V>&

    -8T48T V

    In this case! after 

    5$66 $771(&)

    is e"ecuted! the constant S& will become S. Therefore

    E0

  • 8/9/2019 Francis Compiler

    75/103

    V will not be set to &. Instead! it will become ! which may be very

     baffling to many naive users.

      In contrast to Scall by address! Scall by value avoids the above

     problem because only the temporary values of the parameters are passed.

    Thus the changes to the parameters will be local within the subroutine.

    They will not have any lasting effect.

      To implement the Scall by value method! we shall store the values!

    instead of the addresses! of the parameters! immediately after the ,ump%

    to%subroutine instruction. +or instance! in the above case! we shall have

    V* $

    61 75 I

    6& 75 V

    6 75  

    Within the subroutine ! we may use the load%indirect instruction to

    transfer the parameters to local variables. $ll instructions in the

    subroutine would then refer to these local variables.

      $ctually! if the language uses call by address! it is easy to simulate call

     by value by e"plicitly having the subroutine copy the arguments into local

    variables. The subroutine would then only refer to these copies! and

    never to the original variables.

    -ne simple way to protect the variables from the undesirable effect

    of call by address is to simply rename the variables. +or e"ample! we

    may do the following:

    E/

  • 8/9/2019 Francis Compiler

    76/103

    >G

    5$66 $771()

    or the subroutine may use a local variable with the statement

    >G

    as the first statement and all GRs replaced by Rs in the program body.

    Then the calling programs would not need >G and G is not changed by

    the subroutine call.

      In the following! we shall assume that call by address is used unless

    stated otherwise.

    E=

  • 8/9/2019 Francis Compiler

    77/103

    *ection /.& 5ode 9eneration for 2ecursive 6anguages

      *o far we have assumed that within a *ubroutine *! there is no calling

    of *. If a subroutine calls itself! what will happen? To mae sure that

    the reader understands this problem! let us first consider the following

     problem.

    *uppose that we lie to compute the factorial function defined as below:

    +$5T-2I$6 (1)>1

    +$5T-2I$6 (3)>3O+$5T-2I$6 (3%1)

    where 3 is a positive integer.

      The following +2$35I* program correctly computes the above

    function:

    42-92$' +$5T-2I$6

    F$2I$6# I3T#9#2: 3

    F$2I$6# 2#$6: +$5I348T 3

    5$66 +*(3!+$5)

    $71 -8T48T +$5

    #37

    *82-8TI3# +*(I3T#9#2: I!2#$6:2)

    F$2I$6# I3T#9#2:'

    6$#6 61

      I+ I 6# 1 T;#3 9T- 61'>I%1

    EE

  • 8/9/2019 Francis Compiler

    78/103

    5$66 +*('!2)

    $72 61 I+ I 6# 1 T;#3 2>1.L #6*# 2>2OI

    #3*

      We have now denoted two addresses: $71  and $72 . $71  is the return

    address for 5$66 +* in the main program and $72   is the return

    address for 5$66 +* within itself. Imagine that the first 5$66 +* is

    e"ecuted. In this case! $71  is stored indirectly as the return address.

    Then! within this subroutine! we have a 5$66 +* instruction. This

    time! $72  is the return address. *ince there is only one memory location

     prepared for the return address of *ubroutine +*! $72  will necessarily

    override $71 ! in some sense. This is quite undesirable because we now

    that the control has to go eventually bac to $71 .

      'any programming languages! such as +-2T2$3! solve the above

    mentioned problem by simply prohibiting a subroutine from calling itself.

    $ programming language which allows a subroutine to call itself is called

    a recursive programming language. $69-6! 6I*4 and 4$*5$6 are all

    famous recursive programming language. We would lie to emphasiDe

    here that a programming language can be easily made to be recursive. In

    fact! the authors have used many +-2T2$3 compilers which can handle

    recursive calls.

      6et us now see why our program does compute the factorial if our 

    compiler is appropriately implemented. Imagine that

    EJ

  • 8/9/2019 Francis Compiler

    79/103

    E

  • 8/9/2019 Francis Compiler

    80/103

     3 is equal to . The following actions should tae place:

      (1)+* is first called! with 3 being equal to and +$5 unnown. The

    return address is $71 .

      (&)Inside +*! I is now set to be . *ince it is not less than or equal to

    1!' is set to be &. +* is called again and the return address is set to be

    $72 . It is important to note that I will be equal to . To simplify the

    discussion! let us denote this return address as $721 .

      ()$gain! inside +*! I is now equal to &! ' is set to be 1 and +* is

    called. The return address is set to be $72  with I equal to &. To simplify

    the discussion! let us now call this return address as $722 .

      (0)+inally! inside +*! I is equal to 1. 5ontrol goes to $72  and 2 is

    computed to be equal to 1.L.

      (/)5ontrol then goes to $722 . *ince I is equal to &! 2 is computed to

     be equal to &.

      (=)5ontrol then goes to $721 . *ince I is equal to and 2 is equal to &!

    2 is computed to be equal to .

      (E)The value of +$5 now becomes = and control goes to $71 .

      There is another problem in recursive program language compiler 

    design. In a non%recursive language compiler! a variable is assigned a

    fi"ed memory location. +or a recursive language compiler! this obviously

    can not be done. 5onsider the variable I within +*. When the first time

    the subroutine is called! I is equal to. This value has to be saved in some

    JL

  • 8/9/2019 Francis Compiler

    81/103

    way because we are going to use it later when control comes bac to $72

    . If I is assigned to a fi"ed location! we shall have serious problems.

      To facilitate these recursive language properties! we may use a stac.

    7uring the run time! each time a subroutine is called! an activation record

    containing arguments and the appropriate return address is pushed into a

    stac. To build a lin between the calling instruction and the called

    subroutine! a global variable pointing to the current activation record may

     be used. That is! this global variable changes each time a subroutine is

    called. $ll of the return addresses and parameters can be found by using

    this global variable. 6et us now denote this global variable 582$5T

    (current activation record). We shall also use another global variable

    T-4*T$5 to denote the top of the stac.

      5onsider the above factorial program with the input 3 equal to .

    When the first

    5$66 +*

    is e"ecuted! an activation record will be pushed into the stac as shown

     below:

    J1

  • 8/9/2019 Francis Compiler

    82/103

    T-4*T$5 

    +* $ctivation

    582$5T 2ecord 7ynamic 6in  

    5alling2outine

    $ctivation

    2ecord

      We shall note that the activation record may easily be of different

    lengths. Therefore! we shall store the stac as a lined list! so that we can

     pop it out later. The lin to pointing to the calling procedureRs activation

    record is called the dynamic lin.

      What should be stored in the activation recordM #vidently! the

    following must be included:

    (1)The return address.

    (&)The parameters which should be passed.

    ()The local variables.

    (0)The dynamic lin.

    We may arrange an activation record as follows:

    local variables

    4arameters

    2eturn address

    7ynamic lin 

    J&

  • 8/9/2019 Francis Compiler

    83/103

      +or instance! when the program is e"ecuted! the activation record will

    appear as below:

      T-4*T$5 +$5(>M)

     3(>)-perating *ystem

    entry point

    582$5T 7ynamic lin  

      3ote the difference between a non%recursive language compiler and a

    recursive language compiler. In a recursive language compiler! the

    variables are assigned to locations in an activation record which is pushed

    into a stac. $s will be e"plained later! every time a subroutine call is

    made! it is necessary to put all of its variables into an activation record

    and pushed into a stac.

      *uppose we have an activation record as follows:

      T-4*T$5 H

    G

    2eturn address

    582$5T 7ynamic lin  

    6et us consider the statement:

    H>GK

    J

  • 8/9/2019 Francis Compiler

    84/103

    We shall not use the following ind of instructions any more:

    67 G

    $77

    *T H

    Instead! we shall do the following:

    67 582$5T(&)

    $77 582$5T()

    *T 582$5T(0)

    In other words! there is a table relating G! and H to the offsets from the

    address 582$5T. This information may be easily stored in the identifier 

    Table.

      When the first +* is called! a new activation record is pushed into the

    stac as below:

    '(>M)

    2 +$5(>M)

    I 3(>)

    $71 -perating system entry point

    7ynamic lin 7ynamic lin  

      +or this new activation record! e"planations are in order:

    (1)7ynamic lin! $71 ! I and 2 are put into the activation record before

    the

    J0

  • 8/9/2019 Francis Compiler

    85/103

    J/

  • 8/9/2019 Francis Compiler

    86/103

    V* +*

    instruction.

      (&)' is put into the activation record by the codes inside the

    subroutine.

      ()*ince we assume that call by address is used! I and 2 are pointers

     pointing to 3 and +$5 respectively.

      The following instructions can be used to construct the new activation

    record and ,ump to the subroutine +*. (The reader has to remember that

     before the new activation record is constructed! 5825$T and

    T-4*T$5 all refer to the old activation record.)

    I35 T-4*T$5  

    67 T-4*T$5  

    *T T#'4 (The present T-4T$5 is temporarily stored.)67 582$5T

    *TI T-4*T$5  (The dynamic lin is now put into the new activation record.)

    I35 T-4*T$5  

    67   $71

    *TI T-4*T$5 ( $71  is put into the new activation record.)

    I35 T-4*T$5  

    67 582$5T

    $77 TW-

    *TI T-4*T$5(I is put into the new activation record and it point to 3.)

    I35 T-4*T$5  

    J=

  • 8/9/2019 Francis Compiler

    87/103

    67 582$5T

    $77 T;2##

    *TI T-4*T$5(2 is put into the new activation record and it points to +$5.)

    67 T#'4

    *T 582$5T(The new 582$5T is now established.)

    V* +* (Vump to +*).

      3ote that this new activation record is only partially constructed before

    the

    V* +*

    instruction because there is no way for the code generator to now the

    e"act siDe of the new activation record. The ,ob of constructing this new

    activation record will be completed within *ubroutine +*. In other 

    words! within +*! there will be instruction which put Fariable ' into

    the new activation record.

      $fter the new activation record is constructed! whenever a variable is

    mentioned! the code generator uses its offset with respect to 582$5T.

    +or instance! within +*! the instruction

    '>I%1

    will be translated into the following codes:

    67 582$5T(&)

    *8 -3#

    *T 582$5T(0)

    JE

  • 8/9/2019 Francis Compiler

    88/103

    +inally! the stac will appear as follows:

    ' '(>1) '(>&)

    2 2 2 +$5(>M)

    I I I 3(>)

    $72   $72   $71 -perating system entry point

    7ynamic lin 

    0 & 1

      $t this point! no more subroutine calls are necessary and control goes

    to $72 . 2 is computed to be equal to 1. $ctivation 2ecord 0 is popped

    out. I is pointing to & as registered in $ctivation 2ecord . $ctivation

    2ecord is pooped out with 2 computed to be equal to &. 5ontrol goes to

    $72  with 2 computed to be equal to = and +$5 is set to be equal to =

     because of an indirect store for 2. $ctivation 2ecord & is now popped

    out. 5ontrol goes to $71  and +$5 is printed out as equal to =.

      We would lie to emphasiDe here that +$5 has to be permanently

    changed after the subroutine calls. Therefore call by address must be

    used here.

    JJ

  • 8/9/2019 Francis Compiler

    89/103

    *ection /. Interpreter 

      $ compiler translates a high%level language into a low%level language

    to be e"ecuted later by the target machine. There are many cases where

    we do not want such a thorough translation because we can translate the

    source program into some intermediate code and directly e"ecute this

    intermediate code. In such cases! we are not compiling we are merely

    interpreting. $fter compiling! we have a machine%language ob,ect

     program! while we do not have that after interpreting. #verytime we

    want to rum this program! we have to interpret it again.

      Interpreting can be e"plained by considering an e"ample. 5onsider the

    following high%level instruction:

    G>K8OF

    The intermediate code for the above instruction may consist of the

    following two quadruples:

     (O!8!F!T1)

     (K!!T1!G)

    $ compiler will e"amine the above quadruples and then generate

    assembly language instructions as follows:

    67 8

    '4 F

    *T T1

    J

  • 8/9/2019 Francis Compiler

    90/103

     3ote that this is the only thing that a compiler does. It does not bother to

    have these instructions e"ecuted.

      $n interpreter will not generate these assembly language instructions.

    It is more straightforward it simply e"ecutes the first quadruple by

    causing 8 to be multiplied by F and storing the result in T1. 6ater! it

    e"amines the second quadruple and will immediately e"ecute it by adding

    to T1 and storing the result into G.

      *ince there is no ob,ect program generated! an interpreter requires much

    less space. This is why in many computers where memory space is

    limited! interpreters! instead of compilers! are used. The most famous

    language which uses the interpreter concept is $*I5. We began to see

    $*I5 compilers coming out only recently.

      When one is debugging a program ! it will be vary useful if we have an

    interpretive mode incorporated into the compiler. That is! we may give

    the user a choice between generating an ob,ect code program and

    interpreting it . This will give the user a quic way to find out whether 

    his program is correct or not. $fter he is satisfied with his program! he

    can then request that his program be compiled.

    L

  • 8/9/2019 Francis Compiler

    91/103

    *ection /.0 The 2elationship between a 5ompiler and the -perating

    *ystem

      8p to now! we have carefully avoided the issue of compiling

    inputPoutput instructions. To translate an inputPoutput instruction into

    machine dependent assembly language code! one can not avoid using the

    software facilities provided by the operating system. +or instance! we

    never write our own inputPoutput drivers unless it is absolutely necessary

    to do so ! because we can and should use the drivers provided by the

    operating system. esides! we usually can not sip the inputPoutput

    system provided by the operating system even if we want to do so. This

    is due to the fact that no one can arbitrarily write data onto diss! as these

    actions may cause calamities. 3ote that there are system programs as

    well as other usersR data on the dis! and your writing action may destroy

    them totally and crash the system.

      +urther more! a compiler may allow a user to use subroutines already

    defined and stored in the system. It may also allow the use of overlay

    facilities. In both cases! a compiler writer has to now quite a lot of the

    operating system under which this compiler is going to wor.

      In conclusion! there is no way! in practice! to separate a compiler from

    the operating system. In this chapter! we have not touched those

     problems because these issues can not be easily discussed and the

    techniques used are usually not considered to be compiler writing

    techniques.

    1

  • 8/9/2019 Francis Compiler

    92/103

    &

  • 8/9/2019 Francis Compiler

    93/103

    *ection = The Implementation Techniques.

      We have discussed compiler writing techniques. We now as a crucial

    question: What language should we use to write compilersM *hould we

    use machine languages or some high%level languageM

      -f course! it is much easier to use a high%level language. The only

    reason for using some low%level language is that it produces a more

    efficient compiler. ut the problem of maintaining such a compiler can

     be quite serious.

      There are several problems arising from using high%level languages to

    write compilers. 6et us imagine that we want to write a 4$*5$6

    compiler and we have concluded that 4$*5$6 is the ideal language to be

    used to write this compiler. ;ere is the dilemma. We do not have a

    4$*5$6 compiler yet. ;ow can we use 4$*5$6 to write any programM

      To solve this problem! we may use some language available to us to

    write a very small compiler which will wor for a small subset of 

    4$*5$6 instructions. We shall call the language comprising this

    4$*5$6 subset 4$*5$61 . 8sing 4$*5$61 ! we can write a compiler for 

    a larger subset of instructions in 4$*5$6. Thus a 4$*5$6 2  compiler is

    now produced. We may repeat this process until the entire set of 

    4$*5$6 instructions are included in some compiler and this compiler is

    a 4$*5$6 compiler.

  • 8/9/2019 Francis Compiler

    94/103

      6et us consider another problem. *uppose we want to write a

    4$*5$6 compiler for 42I'# &/L. et! for some reason! we lie to use a

    F$G machine. -ne reason may be that there is a very good 4$*5$6

    compiler on the F$G machine which means that we can use the e"cellent4$*5$6 language to write our compiler. $nother possible reason is that

    the 42I'# &/L has not arrived yet.

      If the 42I'# &/L is available to us! we can use it to test the

    correctness of our compiler. That is !we can always e"ecute the ob,ect

    code program on the 42I'# &/L. If the 42I'# &/L is not available! we

    must have an emulator of the 42I'# &/L on the F$G machine. We can

    use this emulator to test our compiler ob,ect program.

      $fter the correctness of our compiled is established! we still have to

    transport it to the 42I'# &/L. ;ow do we do thatM 2emember that this

    compiler is written in 4$*5$6 which does not e"ist in the 42I'# &/L

    machine. What we do is to compiler our own compiled. The result is a

    42I'# &/L a