Francis Compiler

8/9/2019 Francis Compiler

1/103

Introduction to Compiler Writing

R. C. T. Lee

Institute Of Computer and Decision Sciences

National Tsing Hua Universit

Hsinc!u" Tai#an

Repu$lic Of C!ina

C. W. S!en

%irst International Computer Corporation

Taipei" Tai#an

Repu$lic Of C!ina

S. C. C!ang

Department Of &lectrical &ngineering and Computer Sciences

Nort!#estern Universit

&vanston" Illinois

U. S. '.


2/103

'(STR'CT

This chapter introduces compiler writing techniques. It was written

with the following observations made by the authors:

(1) While compiler writing may be done without too much nowledge

of formal language or automata theory! most boos on compiler writing

devote most of their chapters on these topics. This inevitably gives most

readers a false impression that compiler writing is a difficult tas. In fact!

it is our observation that may people many easily develop some inferior

comple" out or reading these difficult boos.

We have deliberately avoided this formal approach. #ach aspect of

compiler writing is presented in such an informal way that any naive

reader can finish reading this chapter within! say five or si"! casual hours.

$s a matter of fact! a lot of readers finished reading this chapter before

they went to sleep. They almost treated this chapter as a bed%time novel.

(&) There are several different aspects of compiler writing. 'ost

boos would give detailed presentations on every aspect and still fail to

give the reader some feeling on how to write a compiler. We decided to

use a different approach we shall give an anatomy of a simple compiler.

'ost important of all! the internal data structure of this simple compiler

will be clearly presented. #"amples will be given and a reader will be

able to understand how a compiler wors by studying these e"amples. In

&


3/103

other words! we mae sure that a reader will not lose his global view of

compiler writing he sees not only trees! but also forests.


4/103

*ection 1. Introduction.

$ compiler is a software which translates a high%level language

program into a low%level language program which can be e"ecuted by a

computer. In many compilers! the low%level language is the machine

language. +or our purpose! we assume that our compiler translates a

high%level language into an assembly language simply because it is easier

for us to comprehend assembly languages.

It is our e"perience that a student! while learning compiler writing

technique! has to spend a ma,or part of his time studying an assembly

language. This is rather unfortunate because the code generation part of a

compiler is quite straightforward. -ne ,ust has to remember many

details. y spending too much time in learning assembly languages! a

student loses his precious time to study the other important techniques in

compiler writing.

To mae sure that a student would not have to spend too much time

in studying assembly languages! we decided to use a very simple

assembly language. The assembler language is described in *ection /. $s

the reader can see! it is very easy to learn .

0


5/103

et! it contains enough instructions that it will not be difficult for a

student to translate a high%level language into this assembly language.

If a student is ased to write a compiler with our assembly language as

its target language! he will not be bogged down by the details of the

assembly language. Instead! he can concentrate all his mind to study the

other important compiler writing sills.

*o far as the high%level language is concerned! it is the same story we

do not want to use a very complicated high%level language! such as

+-2T2$3! or 4$*5$6. We do not want to use a subset of +-2T2$3!

either for various reasons which will become clear later. We therefore

designed a high%level language! called +2$35I*! for our compiler

writing purpose. +2$35I* is sufficiently complicated that writing a

compiler for it is a non%trivial ,ob. *till! +2$35I* is ideal for compiler

writing because we deliberately designed it for this purpose. The reader

will gradually appreciate this point.

#ssentially! +2$3I5* is similar to +-2T2$3! e"cept that it contains

some special features from 4$*5$6. It contains most of the well%nown

+-2T2$3 instructions! such as 7I'#3*I-3! *82-8TI3#! 9- T-

and so on. *o far as the I+ statement is concerned! we have an I+.

T;# . #6*# type of statement! a typical feature borrowed from

4$*5$6. ;owever! to simplify the synta" analysis discussion! we do not

allow I+ inside I+. $ll of the variables have to be declared. $gain! a

feature borrowed from 4$*5$6. We shall first assume that +2$35I* is

/


6/103

not a recursive language in the sense that subroutines in +2$35I* can

not call themselves. $fter maing sure that students understand the basic

concept of compiler writing! we then add the recursive feature and also

allow the I+ statement to contain I+ statements. y that time! it should

not be difficult at all to understand how these can be done.

The reader may as: Why is the language called +2$35I*. Well! we

name it +2$35I* in memory of the great saint! *t. +rancis of $ssisi.

This chapter was written when the world was facing an energy crisis.

$ lot of people were quite worried because they might face a dry summer

(not enough gasoline for their cars) and a cold winter (not enough heating

oil for their homes). The fact that millions of homeless people having not

enough food to eat and not enough water to drin did not seem to bother

them.

We believe that the world does not lac energy. What we need is more

warmth and more love. $s the saying goes!


7/103

: > ? @ A B C

#nglish words enclosed by B and C are elementary constructs in

+2$35I*! The symbol ::> means BletterC@BletterC?BdigitCA.

In +2$35I* reserved wors are used. These reserved words can not

be used as ordinary identifiers. They are all underlined in the definition.

+or instance! the following e"pression describes the synta" rules of array

declaration instructions:

Barray declaration part ::> 7I'#3*I-3 array declaration +or the

definition of array definition! consult the appendi".

E


8/103

*ection &. $ 9limpse of 5ompilers.

efore getting into the details of a compiler! let us consider a very

simple program and see what the compiler should do. The program is as

follows:

F$2I$6# I3T#9#2: G! ! H

G > E

>

H >G %

#37

+or the first instruction declaring G! and H to be integers! the

compiler will generate the following assembly language instructions:

G 7* 1

7* 1

H 7* 1

+or the instruction

G > E

the compiler will create a constant! say called Il! and assembly language

codes as follows:

J


9/103


10/103

I1 75 E

67 I1

*T G

*imilarly! for instruction

>

the compiler will generate the following codes:

I& 75

67 I&

*T

+inally! for the instruction

H > G K

the compiler will generate

67 G

$7

*T T1

67 T1

*T H

The reader may wonder why the above sequence of codes are so

inefficient. They should simply be

67 G$7

*T H

In fact! we deliberately showed the codes which are not

efficient for reasons that will become clear later.

8ltimately! the high%level language program will be translated into the

following assembly language codes:

1L


11/103

G 7* 1

7* 1

H 7* 1

I1 75 E

I& 75

67 I1

*T G

67 I&

*T

67 G

$7

*T T1

67 T1

*T H

This is what a compiler would do. ut! how does it do itM

asically! a compiler is divided into three parts: the le"ical analyDer!

the synta" analyDer and the code generator! as described below:

;igh level 6#GI5$6 *3T$G 5-7#

language programs $3$6H#2 $3$6H#2 9#3#2$T-2

fig. &.1

(1) The input of the le"ical analyDer is a sequence of characters

representing a high%level language program. The ma,or function of the

le"ical analyDer is to divide this sequence of characters into a sequence of

11


12/103

toens. +or instance! consider the instruction.

F$2I$6# I3T#9#2: GN N H

The output of the le"ical analyDer isF$2I$6#

I3T#9#2

:

G

N

N

H

(&) The input of the synta" analyDer is a sequence of toens which is

the output of the le"ical analyDer. The synta" analyDer divides the toens

into instructions. +or each instruction! using the (grammatical rules)

already stored! the synta" analyDer determines whether this instruction is

grammatically correct. If not! error messages will be printed out.

-therwise! this instruction is decomposed into a sequence of basic

instructions and these instructions are fed into the code generator to

produce assembly language codes.

+or instance! consider the instruction

G>K8OF

The synta" analyDer would determine that this instruction is indeed

correct and later produce three basic instructions as follows

T1>8OF

T&>KT1

G>T&

() The code generator accepts the output from the synta" analyDer.

1&


13/103

#very basic instruction is now translated into a sequence of assembly

language instructions. This can be easily done because the code generator

can e"amine the pattern of the basic instruction and determine which

sequence of codes it must generate.

To mae sure that the code generator can recogniDe the pattern easily!

we may assume that each basic instruction produced by the synta"

analyDer is of a standard from. In our case! we may assume that the

standard form is a quadruple as follows:

(operator! -perand1! -perand& ! 2esult).

+or instance! for basic instruction

G>KHN

its standard form will be

(KN N HN G)

+or the instruction

G>%HN

The standard form will be

(% N N HN G)

The code generator generates assembly language codes by e"amining the

operator! operands and result.

1


14/103

*ection . The 6e"ical $nalyDer

$s we indicated before! the function of the le"ical analyDer is to group

the input sequence of the characters into toens. This is the ma,or

function. $ctually! as the reader will see! the le"ical analyDer does more

than identifying toens.

*ection .1 The Identification of Toens.

*o far as the language +2$35I* is concerned! the identification of

toens is trivial. 3ote that toens in +2$35I* are separated by

delimiters! such as


15/103

1/


16/103

3ote that for some other high%level languages! it is by no means easy to

identify toens. $ typical e"ample is +-2T2$3. In +ortran!


17/103

7elimiters

1

& (

)0 >

/ K

= %

E O

J P

↑

1L Q

11 R

1& :Table 1

*ection .&. The Types of Toens 2ecogniDed by the 6e"ical $nalyDer.

We have shown how the le"ical analyDer identifies toens. In this

section! We shall illustrate some other functions of the le"ical analyDer.

3ote that the ma,or ,ob of the le"ical analyDer is to prepare everything

for the synta" analyDer. Imagine that the synta" analyDer encounters the

following instruction:

F$2I$6# I3T#9#2 G! ! H

In this case! the synta" analyDer recogniDes that this instruction is a

declaration instruction because it detects the toen SF$2I$6#. In

+2$35I*! the word SF$2I$6# has its particular meaning and can

not be used in any other way. +or instance! one can not write the

following instruction:

1E


18/103

VARIABLE = VARIABLE + 1;

The word SF$2I$6# is a reserved word for +2$35I*. *imilarly!

the words SI+ ! ST;#3 ! S#6*# and so on are all reserved words in

+2$35I* which can not be used as ordinary variables.

It is appropriate to point out here that in +-2T2$3! there is no such a

reserved word concept. Indeed! one may have

I+ > I+ K 1

in +-2T2$3 and this is absolutely not allowed in +2$35I*.

*ince the recognition of reserved words is so important for the synta"

analyDer! it is appropriate for the le"ical analyDer to do the ,ob for it.

Table & contains all of the reserved words used in +2$35I*. *ince this

table is also1. $37&. --6#$3. 5$660. 7I'#3*I-3/. #6*#

=. #34E. #3*J. #U. 9#1L. 9T11. 9T-1&. I+1. I348T10. I3T#9#2 1/. 6$#61=. 6#1E. 6T1J. 3#

1. -2 &L. -8T48T&1. 42-92$'&&. 2#$6&. *82-8TI3#&0. T;#3&/. F$2I$6#

Table & (2eserved Word Table)

1J


19/103

searched very frequently! we may use hashing to store and retrieve data

for this table.

In addition to recogniDing reserved words! we may also recogniDe

numerical constants in the le"ical analyDer phase. This can be easily done.

If a toen starts with a number! it must either be a real number or an

integer. It is easy to differentiate these two because a real number

contains the S..

In summary! for any toen! the le"ical analyDer determines whether it

is a delimiter! a real number! an integer! a reserved word or an identifier.

We! of course! have never defined identifiers. $n identifier is a toen

which is not a delimiter! a real number! an integer or a reserved word. In

general! an identifier usually denotes a variable! an array! a label! the title

of a program or the title of a subroutine.

*ection .. The 2epresentation of Toens.

We now as the question: ;ow does the le"ical analyDer let the synta"

analyDer now what the type of a toen is .

efore answering this question! let us note another problem. -ur

toens are of different lengths. +or internal representation! this is not an

ideal situation.

3ote that a toen! after the le"ical analysis is e"ecuted! must belong to

one of the following types:

1


20/103

1. 7elimiters.

&. 2eserved words.

. Integers.

0. 2eal numbers.

/. Identifiers.

#ach class of toens has a table associated with it. We have already

shown that delimiters are contained in Table 1 and reserved words are

contained in Table &. We now discuss why the other toens also need

tables.

Whenever a toen is identified as an integer! we put this integer into

Table . This is necessary because finally! the code generator is going to

use this table to generate constants. +or instance! consider the following

instructions

G > /

> E

$fter the e"ecution of the le"ical analysis phase! Table will contain two

integers as below:

/

E

.

.

.

*imilarly! we shall designate Table 0 to contain real numbers.

+or each identifier! we shall put it into some location in Table /. the

Identifier Table. The Identifier Table is designed to contain three entries:

&L


21/103

the subroutine to which the identifier belongs! the type of the identifier

and a pointer pointing to some other tables. These will be e"plained in

detail later. 'eanwhile! we merely have to note the following:

(1) #very identifier is hashed into Table /.

(&) If an identifier appears more than once in the same subroutine! this

identifier is put into Table / only once.

() If the same identifier appears in different subroutines! it will

occupy different locations in Table /.

We shall use the second column entry of Table / to point to the subroutine

in which identifier appears.

6et us consider the following e"ample.

42-92$' '$I3

F$2I$6# I3T#9#2: 8! F! W! H

8 > /=

F > E

W > 8 K F

5$66 $1(W! 1=! H)

#34

*82-8TI3# $1(I3T#9#2: G! ! H)

H > G K&.E O

#3*

&1


22/103

In the above e"ample! there are two subroutines: '$I3 and $1 (The

main program is also considered to be a subroutine.). The identifiers

which occur in the main program are

'$I3

8

F

W

H

and the identifiers occurring in the subroutine $1 are

$1

G

H.

3ote that the identifier H occurs in both '$I3 and $1. It will appear

twice in Table /.

$fter the e"ecution of the le"ical analyDer! Table / may loo lie the

table shown below:Identifier *ubroutine Type 4ointer

1& W '$I30 H

/ F = G JEJ $1 8 J

1L J11 H

Table /

&&


23/103

5onsider the second entry of the above table. This entry contains

Identifier W which points to the third location of Table /. *ince the third

location of Table / contains '$I3! this identifier W appears in

*ubroutine '$I3. *imilarly! in the tenth location! we have Identifier pointing to the eighth location of this table. *ince the eighth location

corresponds to *ubroutine $1! appears in *ubroutine $1.

We have shown that each toen! after the le"ical analysis phase! is

found to be associated with a unique location of one of the tables from

Table 1 to Table /. The meaning of each table is summariDed as follows:

1. 7elimiter Table

&. 2eserved Word Table

. Integer Table

0. 2eal 3umber Table

/. Identifier Table.

We may say that each toen is characteriDed by two numbers i and , if

it occupies the ,th location of the ith table. Indeed! to solve the problem

that toens are of different lengths! we may simply use this &%tuple (i !

,)to represent this toen. 3ote that this representation is one%to%one in the

sense that given ( i ! , )! we can uniquely identify the toen. +or instance!

for the above e"ample! the &%tuple (/!&) denotes Identifier W and (/!1L)

denotes .

6et us now give a complete e"ample to illustrate our idea by

considering the following programs:

42-92$' '$I3

F$2I$6# I3T#9#2: 8! F! '

&


24/103

8 > /

F >E

5$66 *1(8! F! ')

#34

*82-8TI3# *1(I3T#9#2: G! ! ')

' > G K K&.E

#3*

$fter the e"ecution of the le"ical analyDer! Table will be as

follows:

1 /

& E

Table (Integer Table)

The 2eal 3umber Table will contain only one element as below:

1 &.E

Table 0 (2eal 3umber Table)

6et us assume that after identifiers are put into Table / Table /. loos

as follows:

Identifier *ubroutine Type 4ointer

1 8

&

'$I3

&0


25/103

0 1L

/ F

= '

EJ G 1L

' 1L

1L *1

Table / (Identifier Table)

The entire program will now be represented as shown below:

42-92$' '$I3

(&!&1) (/!)(1!1)

F$2I$6# I3T#9#2: 8 ! F ! '

(&!&/) (&!10) (1!1&) (/!1) (1!11) (/!/) (1!11) (/!=) (1!1)

8 > /

(/!1) (1!0) (!1) (1!1)

F > E

(/!/) (1!0) (!&) (1!1)

5$66 *1 ( 8 ! F ! ' )

(&!) (/!1L) (1!&) (/!1) (1!11) (/!/) (1!11) (/!=) (1!) (1!1)#34

(&!=) (1!1)

*84-8TI3# *1 ( I3T#9#2 : G ! ! ' )

(&!&) (/!1L) (1!&) (&!10) (1!1&) (/!J) (1!11) (/!0) (1!11) (/!) (1!) (1!1)

' > G K K &.E

(/!) (1!0) (/!J) (1!/) (/!0) (1!/) (0!1) (1!1)

#3*

(&!E) (1!1)

&/


26/103

*ection 0. The *ynta" $nalyDer

The functions of the synta" analyDer are as follows:

(1) The synta" analyDer divides the input toens into instructions.

(&) +or each instruction! the synta" analyDer determines whether this

instruction is grammatically correct.

() If the instruction is not grammatically correct! it is re,ected and

some error message is printed out. -therwise! it is decomposed into a

sequence of basic instructions each basic instruction is in a standard

form. In our case! we shall use quadruples to represent basic instructions.

(0) *ome other information obtained by the synta" analyDer will be put

into the various tables of the compiler.

The first ,ob of the synta" analyDer is to divide the input toen stream

into instructions. This is easy for +2$35I* because in +2$35I*! every

instruction is terminated by the delimiter S. We would lie to point out

that this is not the case in +-2T2$3! for instance! because +-2T2$3

assumes that an instruction is contained in one card unless in the ne"t

card! a symbol appears in the =th column. This maes the ,ob of

determining the termination of an instruction much harder.

&=


27/103

3ote that even in 4$*5$6! the separation of toens into instructions

is more complicated. +or instance! we do not allow an instruction as

follows:

F$2I$6# I3T#9#2 : G! ! H 2#$6 : 8! F

which is perfectly legal in 4$*5$6. In +2$35I*! we shall have two

instructions instead:

F$2I$6# I3T#9#2 : G! ! H:

F$2I$6# 2#$6 : 8! F

In 4$*5$6! compound statements are allowed. They start with #9

and end with #37. Within #9 and #37! there can be a sequence of

instructions! each of which is terminated with S. The reader should

realiDe that we deliberately designed +2$35I* to have such ind of

synta" rules so that a compiler can be easily written. In +2$35I*! we

only allow an I+ T;#3 #6*# instruction as follows:

I+ 4 $37 U T;#3 G > GK 1 #6*# G>G%1

3ote that the entire statement is only one instruction. It should not be

difficult for the reader to modify his +2$35I* compiler to handle an

instruction as follows:

I+ 4 $37 U T;#3 #9 G > G K 1 > G K & #37

#6*# #9 G > G K & > G K 6 #37

&E


28/103

*ection 0.1. The 2ecognition of 7ifferent Types of Instructions.

In +2$35I*! instructions may be classified into the following

categories:

(1)program heading instructions

(&)subroutine heading instructions

()variable type declaration instructions

(0)label declaration instructions

(/)dimension declaration instructions

(=)I+ T;#3 #6*# instructions

(E)9- T- instructions

(J)assignment instructions

()5$66 instructions

(1L)IP- instructions

#ach type of instruction has its own synta" rules. The synta" analyDer

has to recogniDe each instruction and branch to a subroutine to handle this

instruction. +or instance! consider a 9- T- instruction. This instruction

must start with a reserved word 9T-. This 9T- must be followed by a

label. This label must be of the type of an identifier. That is ! it must start

with a letter and followed by integers! letters or nothing. Then! finally! we

e"pect a S following this label. *ince we have stipulated that every label

must have been declared before it is used! we have to chec whether the

&J


29/103

identifier following 9T- is indeed a declared label or not. If the toen

following 9T- is not a identifier or it was not declared as a label before!

error messages will be printed.

6et us consider another case. Imagine that we have already recogniDed

the first toen to be the reserved word SF$2I$6#. We then chec

whether the ne"t toen is SI3T#9#2! S2#$6 or S--6#$3 If yes!

we go ahead to chec whether S: follows. If no! an error message is

printed. $fter finding S:! we e"pect a sequence of distinct identifiers!

none of which appeared before and each one is followed by S!. +inally!

we e"pect a S to terminate this instruction. +or each identifier appearing

in this instruction! we have to tae some semantic action which will be

e"plained in detail later. If the variables are not distinct! or some variable

is not followed by S! or some variable appeared before! error messages

will be printed. The entire process of analyDing a variable type

declaration instruction can be illustrated in +ig 0.1.

&


30/103

+I9. 0.1

L


31/103

The reader can see that it is absolutely necessary to recogniDe the

type of an instruction being analyDed. 6et us temporarily assume that

none of our instructions starts with a label (We shall discuss the handling

of labeled instructions later.).If this is the case! every instruction in+2$35I*! e"cept the assignment instructions! starts with a reserved

word. +or instance! the program heading instruction starts with the

reserved word 42-92$' and the I+ T;#3 #6*# instructions start with

the reserved word I+. In the following table! Table 0.1 we show how

every instruction! e"cept assignment instruction! starts with a particular

reserved word. If an instruction does not start with any reserved word!

then it must be an assignment instruction.

Instructions 2eserved Word

4rogram heading instructions 42-92$'

*ubroutine heading instructions *82-8TI3#

Fariable type declaration instructions F$2I$6#

6abel declaration instructions 6$#6

7imension declaration instructions 7I'#3*I-3

I+ T;#3 #6*# instructions I+

9- T- instructions 9T-

$ssignment instructions """

5all instructions 5$66

IP- instructions read I348T

IP- instructions went -8T48T

Table 0.1.

1


32/103

$gain! we lie to point out here that in +-2T2$3! we can not

identify the type of an instruction by checing the first toen! because

+-2T2$3 has no reserved words. ;owever! those who are familiar with

5--6 or $*I5 will be able to note that every 5--6 or $*I5instruction begins with a reserved word and this reserved word uniquely

determines the type of the instruction.

2emember that in our compiler! every toen is characteriDed by two

integers. To chec whether a toen is a reserved word or not! we merely

have to chec whether the first integer is & because Table & contains all of

the reserved words.

We assumed that labels did not e"ist in the previous discussions. This

is! of course! not a valid assumption because we do allow labels to appear

in instructions. $ctually! given an instruction! our first ,ob is to chec

whether the beginning toen is a label or not.

3ote that labels were not detected in the le"ical analysis phase they

were merely identified as identifiers. 3evertheless! in +2$35I*! it is

agreed that a label must have been declared before it is used. In other

words! there must be an instruction I declaring label 6 before it is used.

$fter instruction I is processed by the synta" analyDer! as the reader will

see later! the synta" analyDer will put a note in the Identifier Table to

declare that I identifier 6 is a label. This is done by putting an appropriate

entry in the type entry of the Identifier Table. 6ater! whenever we want to

now whether any toen is a label or not! we merely have to chec this

toen in the Identifier Table. If it is indeed a label! the type entry of this

toen in the Identifier Table will declare this fact.

&


33/103

$t the top of the synta" analyDer! the type of each instruction is

determined as follows:

(1)5hec the first toen of an instruction. If this is a reserved word! the

type of this instruction is determined by this reserved word.

(&)If the first toen is not a reserved word! chec whether this toen is

a label. If this is not a label! this instruction must be an assignment

instruction.

()If the first toen is a label! the type of this instruction is determined

by the second toen. If the second toen is a reserved word! then the type

of this instruction is determined by this toen. -therwise! it must be an

assignment instruction.

*ection 0.&. The 4rocessing of the 4rogram ;eading Instruction.

There can be only one program heading instruction in the entire

program and this instruction must appear as the first instruction. The

reserved word corresponding to this instruction is 42-92$'. $fter this

instruction is processed! the heading of this program will now be

classified as an identifier and the into the Identifier Table. In this

Identifier Table! we do the following:


34/103

(1)In the subroutine entry of the 4rogram heading in the Identifier Table!

we have nothing. 6ater! whenever we notice that nothing appears in the

subroutine entry of an identifier table! we now that this identifier is the

name of a subroutine. 3ote that the main program can also be consideredas a subroutine.

(&)In the pointer entry! we put a S1 there. We do this to indicate that

the quadruples corresponding to this program start from the first location

of the Uuadruple Table. The Uuadruple Table is now illustrated in +ig.

0.&. +or

4ointer

*1 V

' p l

* 6

*&

Identifier Table Uuadruple Table

+ig. 0.&

.

.

.

.

.

.

.

.

.

.

.

.

.

'ain 4rogram ' p

*ubroutine *1

*ubroutine *&

*ubroutine *

l

V

6

0


35/103

every subroutine! a sequence of quadruples are generated and stored

consecutively in the Uuadruple Table. In the Identifier Table! we have a

pointer for every subroutine pointing to the starting point of the

corresponding sequence of quadruples.

*ection 0.. The 4rocessing of variable Type 7eclaration Instructions.

The reserved word to identify a variable type declaration instruction

is F$2I$6#. $fter processing such an instruction! the following actions

will be taen:

(1)6et G1，Ｘ &，……，Gn be variables declared in this instruction. In

the subroutine entry of Gi in the Identifier Table! put a pointer pointing to

the subroutine it appears in. That is! suppose G i belongs to *ubroutine *

which appears in 6ocation 6s in the Identifier Table! then put 6s in the

subroutine entry corresponding to Gi in the Identifier Table.

(&) 4ut a code indicating the type of Gi in the type entry of Gi in the

Identifier Table. We may arbitrarily set the codes as follows:

$rray 1

oolean &

5haracter

Integer 0

6abel /

2eal =

/


36/103

If the synta" analyDer finds out that the type of Gi is boolean! S& will be

put into the type entry of Gi in the Identifier Table.

()+or each variable Gi! a quadruple is generated in the Uuadruple

Table. This quadruple will be used by the code generator later to assign

the necessary space. 6et the location in the Identifier Table occupied by

Gi be 6i. Then the quadruple corresponding to Gi is ((/!6i)! ! ! ).

6et us imagine that a certain variable G is declared as a real variable

and the location occupied by this variable is 1/ in the Identifier Table.

Then the quadruple corresponding to this variable is ((/!1/)! ! ! !). In the

type entsy! we have a S=! indicating that it is a real variable. The code

generator! after noting the &%tuple (/!1/)! will e"amine the 1/th location

of the Identifier Table. It will notice the e"istence of S= in the type entry

and thus will generate the following assembly language codes:

G 7* &

because a real number variable will occupy two memory locations.

*ection 0.0. The 4rocessing of 7imension Instructions.

=


37/103

$ dimension declaration instruction declares arrays. +or every array

declared! we have to record the type of the array! the dimensionality of

the array and the siDe of the array in each dimension. +or instance!

suppose the siDe of the array in each dimension. +or instance! suppose the

siDe of a real array is declared as (1=!/). We now have to record the fact

that the array is real! the dimensionality of this array is &! the siDe of the

first dimension is 1=! and the siDe of the second dimension is /. To store

this information! we need a new table! called Information Table. We shall

denote is Table E. Table E is a one%dimensional table . In the above case!

the information will be stored as follows:

.

.=

&

1=

/

.

.

.

Information Table

In the Identifier Table! a pointer will be made to point to the Information

Table as follows:

X2eal $rray

X7imensionality

XThe siDe of the first dimension

XThe siDe of the second dimension

E


38/103

*ubroutine

$ 1 &1

Identifier Table

6et us summariDe the actions taen by the synta" analyDer to handle a

dimension declaration instruction as follows:

(1)*uppose $ is declared as an array. In the subroutine entry of $ in

the Identifier Table! 4ut an appropriate pointer to indicate the subroutine

it belongs to.

(&)In the type entry of $ in the Identifier Table! put S1 into it!

indicating that $ is an array.

()*uppose that the type code of an array is ,! the dimensionality of

$ is i and the siDes of $ corresponding to different dimensions of $ are

*1， *& ， *i respectively. 4ut *1， *&， *&，… ...*i into the

Informations Table (Table E). *uppose! in the Information Table ! the ne"t

available location is 3! then the information ,. i，*1，*&，….. ，*i will

be put into locations 3! 3K1! . ! 3KiKl respectively. 4ut 3 into the

pointer entry of $ in the Identifier Table.

(0)In the Uuadruple Table ! generate a quadruple in which the first

element identifies the location occupied by $rray $ in the Identifier

Information Table

&1

&&

&

&0

J


39/103

Table.


40/103

That is ! if the array occupies the /th location of the Identifier Table!

then the quadruple generated will be

((/!/)! ! ! )

The information Table will not only contain information concerning

arrays. $s we shall see later !when we want to store other types of

information! such as the information concerning parameters of a

subroutine! we can also use this table. We may view the Information

Table as a warehouse where many people can store things. The only thing

that we have to do is to eep pointers pointing to the location where

information is stored.

The reader may wonder why we donRt eep separate tables for

different types of information. +or instance! why donRt we eep on table

to store information concerning arrays and another table for information

concerning subroutinesM 3ote that if we eep several tables! we have to

reserve quite a lot of memory locations for these tables. That is! we have

to eep one array for each table. *ince we do not now how large the siDe

of each table should be! we are forced to eep many rather large arrays

and later we may find out that many arrays are largely empty and a lot of

memory space is wasted. y eeping one information table! we have

decreased the degree of memory space wasting.

0L


41/103

There is another important point to be made here. *uppose we write the

compiler in +-2T2$3. We certainly can not have an array which will

store integers as well as characters. The Information Table! Table E!

contains only integers. The this is possible is due to the clever design tomae sure that every piece of information is represented by integers.

*ection 0./ The 4rocessing of 6abel Instructions.

6abel instructions are easy to identify they all start with the reserved

word 6$#6. $fter recogniDing a label instruction! the following actions

will be taen.

(1)$rrange a pointer to be put into the subroutine entry of this label in

the Identifier Table.

(&)In the type entry! put


42/103

.

.

.

6$#6 61LL

.

.

.

61LL G>KH

In this case! the following events must tae place:

(1)The le"ical analyDer recogniDes 61LL as an identifier and hashes it

into the Identifier Table.

(&)The synta" analyDer recogniDes the reserved word 6$#6 and

consequently decides that Identifier 61LL must be a label. It then

indicates so by putting


43/103

6ater! when the code generator encounters this quadruple! it will realiDe

that the ne"t assembly language codes ne"t generated should be labeled.

+or

G>KHthe assembly language codes are

67

$7 H

*T G

ecause this instructions is labeled! the code generator will generate a

special label! say 6! in this case. Thus we may have

6 67 $7 H

*T G

Imagine that we later encounter an instruction as

9T- 61LL

In this case! the synta" analyDer checs the pointer entry of 61LL in the

Identifier Table and notices that the starting quadruple corresponding to

the instruction labeled by 61LL starts from location 3 in the Uuadruple

Table. It will therefore generate a quadruple e"pressing


44/103

We hope that the reader can appreciate the rule that we do not use

integers to represent labels! as is done in many languages. In +2$35I*!

every integer recogniDed in the le"ical analysis phase is indeed a number

and it is therefore appropriate to put it into the Integer Table. If we also

used numbers to represent labels! then we have to do some synta"

analysis even in the le"ical analyDer which means that the le"ical

analyDer and the synta" analyDer can not be separated clearly.

*ection 0.E The 4rocessing of $ssignment Instructions.

We are now going to discuss the most interesting and important part of

the synta" analyDer: the processing of assignment instructions. $s we

noted before! assignment instructions are the only instruction which do

not start with a reserved word and must contain an < sign.

In the following discussion! we shall first assume that the assignment

instructions contain arithmetic e"pressions only. The method that we

present can be easily e"tended to handle assignment instructions with

logical e"pressions. esides! let us temporarily assume that arrays do not

appear in the assignment instructions.

5onsider the instruction

G>KH

00


45/103

It is very easy for the synta" analyDer to recogniDe the operator


46/103

(*,U,V,T1)

(K!!T1!G)

where T1 is a temporary variable.

-ur question is :;ow does the computer now that we should first

e"ecute multiplication and then addition？

This problem can be solved by converting an assignment instruction

into its 2everse 4olish 3otation. To obtain the 2everse 4olish 3otation

of an assignment instruction involving arithmetic operations only! we

shall first assume that there is an ordering of operators as follows:

O!P

K!%

( !)

>

The procedure to generate the 2everse 4olish 3otation for an assignment

instruction is as follows:

(1)The symbols of the source string move left towards the ,unction(*ee

+ig. 0.).

(&)When an operand (In our case! an operand is a variable.) arrives! it

passes straight through the ,unction.

()When an operator (In our case! an operator is a delimiter) arrives! it

pauses at the ,unction. If the stac is empty! this incoming operator slides

down into the stac.

0=


47/103

-therwise! it loos at the top of the stac. While the priority of the

incoming operator is not greater than that of the operator at the top of the

stac! the operators in the stac will pop out and move to the left of the

,unction one by one. $fter the above process is finished! the incomingoperator is pushed into the stac.

(0)(always goes down to the stac.

(/)When) reaches the ,unction! it causes all of the operators in the stac

bac as far as (to be shunted out and then both parentheses disappear.

(=)When all symbols have been moved to the left. pop out all of the

operators in the stac to the left ,unction.

6et us consider the case of

G>K8OF

The steps of obtaining the 2everse 4olish 3otation of the above

instruction is illustrated in +ig. 0..

G>K8OF G! > K8OF

(*T$5)

(a) (b)

G! K8OF G! K 8OF

> >

(c) (d)

0E


48/103

G! 8OF G!!8 O F

K K

> >

(e) (f)

G!!8! F G!!8!F

O O

K K

> >

(g) (h)

G!!8!F!O!K! G!!8!F!O!K!>

>

(i) (,)

+ig. 0.

The transformed 2everse 4olish 3otation is thus

G!!8!F!O!K!>.

$fter transforming an assignment instruction into its 2everse 4olish

3otation! one can decompose it into a sequence of basic instructions very

easily. The procedure is as follows: (We assume that every operator is a

binary operator here. It is easy to e"tend this method to handle unary

operators.)

*tep 1. 6et i>1.

*tep &. 4ic the first operator from left to right. 6et this operator be Oi .

0J


49/103

*tep . *can bac and pic up two operands immediately preceding Oi .

6et these two operands be F1 and F2 . 9enerate a basic

instruction Oi iT V V =)!( &1 .

*tep 0. If the last symbol has already been scanned! stop. -therwise! let

i>iK1 and go to *tep &.

6et us consider the assignment instruction.

G>K8OF

again. The 2everse 4olish 3otation of the above instruction is

G!!8!F!O!K!>.

$ sequence of basic instructions are now generated:

asic Instruction Uuadruple

G!!8!F!O!K!> T1>8OF (O!8!F!T1)

G!!T1!K!> T&>KT1 (K!!T1!T&)

G!T&!> G>T& (>!T&! !G)

6et us consider another e"ample:

G>(K8)OF

The 2everse 4olish 3otation of the above instruction is obtain as follows:

G>(K8)OF G! >(K8)OF

(a) (b)

0


50/103

G! (K8)OF G! ( K8)OF

> >

(c) (d)

G! K8)OF G! K 8)OF

( (

> >

(e) (f)

G!! 8)OF G!!8! ) OF

K K

( (

> >

(g) (h)

G!!8!K! OF G!!8!K O F

> >

(i) (,)

G!!8!K F *!G!8!K!F

O O

> >

() (l)

G!!8!K!F!O G!!8!K!F!O!>

>

(m) (n)

+ig. 0.0

/L


51/103

The 2everse 4olish 3otation for

G>(K8)OF

is thus

G!!8!K!F!O!>

$ sequence of basic instructions are generated as follows:

asic Instructions Uuadruples

G!!8!K!F!O!> T1>K8 (K!!8!T1)

G!T1!F!O!> T&>T1OF (O!!T1!F!T&)

G!T&!> G>T& (>!T&! !G)

In the previous discussion! we have only dealt with assignment

instructions with arithmetic operations. In +2$35I*! we allow logical

assignments as well. That is! we may have an instruction as

G>4 $37 U -2 2

where 4!U and 2 are boolean variables. It should be easy for the reader to

see that the above instruction can be transformed into its 2everse 4olish

3otation as follows:

G!4!U! $37! 2! -2!>

if we agree that $37 should be performed before -2. The basic

instructions are generated as below:

/1


52/103

asic Instructions Uuadruples

G!4!U!$37!2!-2!> T1>4 $37 U ($37!4!U!T1)

G!T1!2!-2!> T&>T1 -2 2 (-2!T1!2!T&)

G!T&!> G>T& (>!T&! !G)

*ection 0.J The 4rocessing of Instructions 5ontaining $rrays.

In the previous discussions! we assumed that arrays did not e"ist. If

they do e"ist. If they do e"ist! we have to tae special action. 5onsider

the following instruction:

$(I)>(&!)K0

In this case! the parameters appear because of the arrays $ and and they

must be eliminated first.

3ote that arrays must have been declared before they are used. 6et us

assume that $ and appear in the Eth and the 1=th locations in the

Identifier Table respectively. Then $ is represented by (/!E) and is

represented by (/!1L). In the Identifier Table! both of these identifiers

will be indicated to be arrays. The synta" analyDer! when it encounters

the symbol (/!E)! will now e"amine the Eth location of Table / and will

find out that this identifier represents a array. Immediately! it will branch

to a subroutine to handle this array.

6et us consider a simple e"ample first:

G>(I!V)K0

/&


53/103

In this case! we want to replace (I!V) by a temporary variable. 6et us

assume that $rray was defined as follows:

7I'#3*I-3 I3T#9#2:(!)

The traditional method of memory management will arrange the array as

follows:

(1!1)

(&!1)

(!1)

(1!&)

(&!&)

(!&)

(1!)(&!)

(!)

That is ! (I!V) will be the ((V%1)'KI)th location of the array if the

dimensionalities of are ' and 3 respectively. Thus! to replace (I!V)!

we have to generate the following quadruples:

(%!V!1!T1) T1>V%1

(O!T1!'!T&) T&>T1O'

(K!T&!I!T) T>T&KI

+inally! we need the following quadruple:

(>!!T!T0)!

/


54/103

which is interpreted as

T0>(T).

The assembly language codes corresponding to

(>!!T!T0)

are

67 KT%1

*T T0

The entire sequence of quadruples for

G>(I!V)K0

are(%!V!1!T1) T1>V%1

(O!T1!'!T&) T&>T1O'

(K!T&!I!T) T>T&KI

(>!!T!T0) T0>(T)

(K!T0!0!T/) T/>T0K0

(>!T/! !G) G>T/

6et us consider an e"ample in which an array appears at the left side of

the S> sign:

$(I)>(I!V)

In this case! (I!V) is replaced by a temporary variable the same way as

we discussed before. *uppose the temporary variable is T0. We then

have the situation as follows:

$(I)>T0

/0


55/103

The reader can see that we have three inds of quadruples where the

operators are all S>:

(>! !G) G>

(>!$!I!G) G>$(I)(>!G!$!I) $(I)>G.

That the 5ode 9enerator will not be confused is due to the fact that $ is

a declared array while G and are not.

We discussed how to handle the &%dimensional integer arrays. It

should be easy for the reader to e"tend the above idea to three! or four

dimensional arrays. It should not be difficult to handle real arrays either.

*ection 0.. The 4rocessing of I+ T;#3 #6*# Instructions.

The I+ T;#3 #6*# instructions can be easily recogniDed because they

all start with the reserved word I+. $ general format of this ind of

instruction is

I+ # T;#3 I31 #6*# I3 2

where # is a logical e"pression and I31 and I3 2 are two simple

instructions.

The synta" analyDer! after analyDing the above instruction! would

generate the following quadruple:

(I+!4!U41 !U42 )

//


56/103

where 4 is a logical variable! evaluated to 1 or L! depending on the

e"pression # and the values of its variable during the run time and U41 is

the starting quadruple corresponding to I31 . This quadruple will be

interpreted as SIf 4 is true! e"ecute the codes corresponding to U41

otherwise! e"ecute codes corresponding to U42 . *uppose the quadruple

corresponding to I31 starts from the 1=th location of the Uuadruple Table

(Table=)! then U41 will be represented by (=!1=).

6et us illustrate these ideas by an e"ample. 5onsider

I+ 4 T;#3 G>GK1 #6*# G>GK&

The synta" analyDer will generate the following quadruples! assuming

that the quadruples will start from the 1/th location of the Uuadruple

Table.

1/ (I+!4!(=!1=)!(=!1))

1= (K!G!1!T1)

G>GK1

1E (>!T1! !G)

1J (9T-! ! !(=!&1))

1 (K!G!&!T&)

G>GK&

&L (>!T&! !G)

When the first quadruple of the above sequence is generated! the synta"

analyDer nows that U41 will start from the 1=th location! but it does not

/=


57/103

now where U42 will start. Therefore! the synta" analyDer will have to

wait for the generation of the 1th quadruple. When this quadruple is

generated! the synta" analyDer comes bac to Uuadruple 1/ and fills in

(=!1). The situation is the same for Uuadruple 1J. When this quadruple

is generated! the synta" analyDer does not now how long the sequence of

quadruples corresponding to G>GK& will be. It therefore waits until all

of the quadruples corresponding to G>GK& are generated.

6et us consider a more complicated case:

I+ 4 $37 U T;#3 G>GK1 #6*# G>GK&

In this case! we first have to evaluate 4 and U! assuming that 4 and U

are boolean variables. The quadruples for this instruction with be as

follows:

1/ ($37!4!U!T1Y 1= (I+!T1!(=!1E)!(=!&L))

1E (K!G!1!T&)

1J (>!T& !G)

1 (9T-! ! !(=!&&))

&L (K!G!&!T)

&1 (>!T! !G)

Uuadruple 1/ means that the value of T1 is the $37 of 4 and U. the

assembly language codes for this quadruple are:

1!7 4

$37 U

*T T

/E


58/103

In general, a logical expression can be as coplica!e" as #ollo$s%

G 9T -2 6T H $37 8 #U F.

To evaluate the above e"pression! we may use the 2everse 4olish

3otation to transform it into the following form (We have assumed that

we should evaluate $37 operations before -2 operations.):

G!!9T!!H!6T!8!F!#U!$37!-2

The sequence of quadruples generated will be

(9T!G!!T1)

(6T!!H!T&)

(#U!8!F!T)($37!T&!T!T0)

(-2!T1!T0!T/)

*ection 0.1L. The 4rocessing of 9o T- Instructions.

It is very simple to process 9- T- instructions. +or9T- 6G

the synta" analyDer first has to find out whether 6G is a label or not. This

can be done by checing through the Identifier Table. If it has not been

declared as a label or it was declared to be something else! we produce an

error message. If 6G is indeed a label! we chec whether this instruction

with the label has appeared before. If it appeared! we may generate the

following quadruple:

(9T-! ! !U 4G )

where U4G is the starting quadruple corresponding to the instruction

labeled with 6G. -therwise! we have to wait until the synta" analyDer

encounters the instruction with the label. If this instruction never appears

/J


59/103

or it appears more than once! we also produce error messages.

*ection 0.11 The 4rocessing of 5$66 Instructions.

When a 5$66 instruction is encountered! the synta" analyDer will have

to record the name of the subroutine being called and the arguments

appearing in the instruction. To do this! we again use the Information

Table.

$ 5$66 instruction is of the following form:

5$66 *1( 4 4 4 31 2! !....! )

The quadruple corresponding to the above instruction will be of the

following form:

/


60/103

.

.

.

.(5$66! *1! !(E!')) ' 3 The number of parameters

4ointer pointing to 41

. .

. .

. .

.

4ointer pointing to 4 3

.

.

Information Table (Table E)

*ince we store identifiers and numbers in tables! we have to use two

numbers to represent each 4i . 6et us assume that our 5$66 instruction is

5$66 *1(W!1=!$!/E!)

*uppose that *1! W and $ occupy the =th! 1/th and &Eth locations of the

Identifier Table (Table /) respectively. *uppose 1= occupies the rd

location of the Integer Table (Table)! and /E. occupies the &nd location

of the 2eal Table (Table 0). 3ote that the reserved word 5$66 occupies

the rd location of the 2eserved Word Table (Table &). 6et us assume

that the ne"t available location in the Information Table is /. The

quadruple corresponding to the above instruction will be as follows:

=L


61/103

.

.

.

.((&!)(/!=)! !(E!/)) / 0 3umber of parameters/ W1/

1=/&E $

0& /E.

Table E

*ection 0.1&. #"amples.

In this section we shall present two e"amples designed to show how

quadruples are generated and how the various tables will loo after the

synta" analyDer is e"ecuted.

#"ample 0.1

42-92$' $1

F$2I$6# I3T#9#2:G!!I

7I'#3*I-3 I3T#9#2:$(1&)

6$#6 61! 6&

I>1

G>/>11

61 I+ (G 9T ) T;#3 (9T- 6&) #6*# G>GK&

$(I)>G

I>IK1

9T- 61

6& #34

=1


62/103

*82-8TI3# T4# 4-I3T#2

1

& I / 0

0

/ $1 1 (Uuadruple Table)

=

E $ / 1 1 (Information Table)

J G / 0

1L

11 / 0

1&

1

10 61 / / 1L (Uuadruple Table)

1/ 6& / / 1J (Uuadruple Table)

Identifier Table (Table/)

1 1& 1 0

& 1 & 1 $rray $

/ 1&

0 11

/ & Information Table (Table E)

Integer Table (Table)

=&


63/103

1 ((/!J)! ! ! ) G

& ((/!11)! ! ! )

((/!&)! ! ! ) I

0 ((/!E)! ! ! ) $/ ((/!10)! ! ! ) 61

= ((/!1/)! ! ! ) 6&

E ((1!0)!(!&)! !(/!&)) I>1

J ((1!0)!(!)! !(/!&)) G>/

((1!0)!(!0)! !(/!J)) >11

1L ((&!1L)!(/!J)!(/!11)!(L!1)) T1>G 9T

11 ((&!1&)!(L!1)!(=!1&)!(=!1)) I+ T1 9- T- 1&! #6*# 9- T- 1

1& ((&!11)! ! !(=!1J)) 9T- 6&

1 ((1!/)!(/!J)!(!/)!(L!&)) T&>GK&

10 ((1!0)!(L!&)! !(/!J)) G>T&

1/ ((1!0)!(/!J)!(/!E)!(/!&)) $(I)>G

1= ((1!/)!(/!&)!(!&)!(/!&)) I>IK1

1E ((&!11)! ! !(=!1L)) 9T- 61

1J ((&!=)! ! ! ) 6"

Uuadruple Table (Table =)

#"ample 0.&.

42-92$' $&

F$2I$6# I3T#9#2: I!V!7I'#3*I-3 I3T#9#2: $(&L)!(0!/)

I>&

V>

5$66 $(I!V!)

$()>(I!V)K&.E

#34

*82-8TI3# $(I3T#9#2:G!!)

F$2I$6# I3T#9#2:H

H>=>(G%H)O&K

#3*

=


64/103

*ubroutine Type 4ointer Type 4ointer

1

& G 11 0

I = 00

/ $ = 1 1 (Information Table)

= $& 1 (Uuadruple Table)

E = 0

J 11 0

= 1 0 (Information Table)

1L V = 0

11 $ 1= (Uuadruple Table)

1&1 11 0

10

1/ H 11 0

1=

Identifier Table (Table /)

1 0

& 1 $rray $

&L

0 0/ & $rray 1 &L 1 &.E

= 0 & 0

E / /

J 0 & 2eal Table

/ I / (Table 0)

1L = =

11 / V $(I!V!) E 1

1& 1L

1 / Integer Table

10 E (Table )

Information Table

(Table E)

=0


65/103

Uuadruples:

1 ((/!)! ! ! ) I

& ((/!1L)! ! ! ) V

((/!E)! ! ! ! )

0 ((/!/)! ! ! ! ) $

/ ((/!)! ! ! ! )

= ((1!0)!(!0)! !(/!)) I>&

E ((1!0)!(!/)! !(/!1L)) V> 4rogram $&

J ((&!)!(/!11)!!(E!J)) 5$66 $(I!V!)

((1!=)!(/!1L)!(!E)!(L!1)) T1>V%1

1L ((1!E)!(L!1)!(!&)!(L!&)) T&>T1O0

11 ((1!/)!(/!)!(L!&)!(L!)) T>IKT&

1& ((1!0)!(/!)!(L!)!(L!0)) T0>(T)

1 ((1!/)!(L!0)!(0!1)!(L!/)) T/>T0K&.E

10 ((1!0)!(L!/)!(/!/)!(/!E)) $()>T/

1/ ((&!=)! ! ! ) #34

1= ((/!&)! ! ! ) G

1E ((/!1)! ! ! )

1J ((/!J)! ! ! )

1 ((/!1/)! ! ! ) H

&L ((1!0)!(!=)! !(/!1/)) H>=

&1 ((1!=)!(/!&)!(/!1/)!(L!=)) T=>G%H *ubroutine $

&& ((1!)!(L!=)!(!0)!(L!E)) TE>T=K&

& ((1!/)!(L!E)!(/!1)!(L!J)) TJ>TEK

&0 ((1!0)!(L!J)! !(/!J)) >TJ

&/ ((&!E)! ! ! ) #3*

=/


66/103

*ection /. 5ode 9eneration

The function of a code generator of a compiler is to generate assembly

or machine language code. 5onceptually! this is the easiest phase of the

compiler writing. In practice! it is always quite difficult to write because

it is heavily machine%dependent. -ne must be able to master the target

assembly language! the operating system! the linage editor! the file

system and many other features of the target machine. In order to

simplify our discussion! we assume that we are implementing our

compiler on a very simple machine whose assembly language instructions

are shown below:

$77 add

$37 and operation

$29 argument

3 branch if negative

4 branch if positive

H branch if Dero

75 define constant

7IF divide

7-* define storage

#G4 e"ponential

#37 end

I35 increase by one

V'4 unconditional ,ump

==


67/103

&'B p !o sbro!ine

67 load

67I load%indirect

'4 multiply

3-4 no operation

-2 or operation

2T3 return

*T store

*TI store%indirect

*8 subtract

GI input

G-2 e"clusive of operation

G4 output

We shall first assume that our language is non%recursive.

The code generation of recursive language will be discussed later. 6et

us consider #"ample 0.&. the code generator will first e"amine Table

and 0 and generate the following instructions:

I1 75 &L

I& 75 0

I 75 /

I0 75 &

I/ 75

I= 75 =

IE 75 1

21 75 &.E

=E


68/103

Then the code generator will e"amine the quadruples one by one and

generate a sequence of instructions for each quadruple. +or instance!

consider the first quadruple. This quadruple is

((/!)! ! ! )$fter e"amining this quadruple! the code generator will find out that this

quadruple calls for the following instruction:

I 7* 1

because the third element in Table / corresponds to an identifier I which

is an integer occupying one word of memory space.

5onsider Uuadruple 0. This quadruple is

((/!/)! ! ! )

We shall first find out that (/!/) corresponds to an array! as indicated by

its type in Table /. The other characteristics of this array can be found by

its pointer pointing to Table E. In Table E! we will note that $rray $ is a

one%dimensional array and occupies twenty memory locations. Thus! the

code generator will generate the following assembly language instruction:

$ 7* &L

*imilarly! the assembly language instruction for

=J


69/103

Uuadruple

((/!)! ! ! )

will be

7* &L

5onsider Uuadruple =! which is

((1!0)!(!0)! !(/!)).

(1!0) will be found to correspond to S>! (!0) will be found to correspond

to S& and (/!) will be found to correspond to SI. Thus this quadruple

will be translated into the following assembly language instructions:

67 I0

*T I

similarly! Uuadruple 11 will be translated into

67 I

$77 T&

*T T

=


70/103

*ection /.1 The 5ode 9eneration for *ubroutine 5alls

6et us consider Uuadruple J of #"ample 0.&:

((&!)!(/!11)! !(E!J)).

The first &%tuple corresponds to 5$66 and (/!11) corresponds to $.

Thus the first assembly language instruction corresponding to the above

quadruple will be

V* $.

$ subroutine call will usually invoe the passing of arguments. The

arguments of this ,ump%to%subroutine instruction can be found by

e"amining the last element of the quadruple. *ince the last elements is

(E!J)! we e"amine the Jth element of Table E. This will inform us that

this calling instruction involves three parameters! namely I! V! and .

There are many methods of passing the arguments. 6et us introduce

the simplest ind first.

'ethod 1%%5all by $ddress

In this method! after the

V* $

instruction! we shall have a sequence of instructions to store the addresses

of arguments which are to be passed. Thus the instructions will be as

follows:

EL


71/103

&'B A

$29 I

$29 V

$29

$fter these instructions! an instruction corresponding to

T1>V%1

will follow. That is! we shall have a sequence of assembly language

instructions as follows:

V* $

61 $29 I

6& $29 V6 $29

60 67 V

*8 -3#

*T T1

When the program is finally e"ecuted! the addresses of I!V and will be

stored in 61! 6& and 6 respectively.

6et us now tae a loo as to how the arguments are passed to the

called subroutine $! in this case. $t the very beginning of the

instructions corresponding to $! we shall have a define%storage

instruction

$ 7* 1

When the calling instruction

E1


72/103

&'B A

is e"ecuted! the machine will store the address immediately after the V*

instruction! namely 61! into the location $. This action serves to build a

lin between a calling instruction and the called subroutine. $s will

become clear later! the arguments as well as the return address are all

passed through this mechanism.

3ote that in #"ample 0.&! we have to pass parameters I!V! and to G!

and (These three variables are local ones.) respectively. To pass I to G!

we may use the following instructions:

67I $

*T G

The load%indirect instruction stores 61 into the register. *ince the

address of I is stored in 61! the store instruction puts the address of I into

location G.

The other parameters can be transferred in a similar way +or instance!

the transferring of V to will be as follows:

I35 $

67I $

*T

To transfer the last argument! we do the following:

I35 $

67I $

*T

E&


73/103

$t the end of the code corresponding to the subroutine $! there should

be a return instruction:

I35 $

2T3 $

3ote that the final $ point to 60 and control will eventually return to 60!

as e"pected.

It is important to note that for every parameter inside a subroutine!

which is not an ordinary variable! its address! not its value! is stored.

Thus every time that the subroutine refers to a parameter! it should use an

indirect command so that the command will actually be referring to the

variable in the calling procedure. +or e"ample! if is a parameter! then

the statement

>K1

is translated into

67I

$77 -3#

*TI

Through this way! the argument is changed in the calling program

which is desired.

E


74/103

'ethod &.%%5all by Falue

The above method of passing parameters during subroutine calls is

called Scall by address because the address of a variable is used. *ince

the address is passed! it may sometimes cause trouble. +or instance! let

us consider the calling of $$771(G) again.

*82-8TI3# $771(I:Integer)

I>IK1

2#T823

#37

There is no problem if we have the following instructions:

G>&

5$66 $771(G)

-8T48T G

In this case! S will be printed out.

*uppose we have the following program:

5$66 $771(&)

V>&

-8T48T V

In this case! after

5$66 $771(&)

is e"ecuted! the constant S& will become S. Therefore

E0


75/103

V will not be set to &. Instead! it will become ! which may be very

baffling to many naive users.

In contrast to Scall by address! Scall by value avoids the above

problem because only the temporary values of the parameters are passed.

Thus the changes to the parameters will be local within the subroutine.

They will not have any lasting effect.

To implement the Scall by value method! we shall store the values!

instead of the addresses! of the parameters! immediately after the ,ump%

to%subroutine instruction. +or instance! in the above case! we shall have

V* $

61 75 I

6& 75 V

6 75

Within the subroutine ! we may use the load%indirect instruction to

transfer the parameters to local variables. $ll instructions in the

subroutine would then refer to these local variables.

$ctually! if the language uses call by address! it is easy to simulate call

by value by e"plicitly having the subroutine copy the arguments into local

variables. The subroutine would then only refer to these copies! and

never to the original variables.

-ne simple way to protect the variables from the undesirable effect

of call by address is to simply rename the variables. +or e"ample! we

may do the following:

E/


76/103

>G

5$66 $771()

or the subroutine may use a local variable with the statement

>G

as the first statement and all GRs replaced by Rs in the program body.

Then the calling programs would not need >G and G is not changed by

the subroutine call.

In the following! we shall assume that call by address is used unless

stated otherwise.

E=


77/103

*ection /.& 5ode 9eneration for 2ecursive 6anguages

*o far we have assumed that within a *ubroutine *! there is no calling

of *. If a subroutine calls itself! what will happen？ To mae sure that

the reader understands this problem! let us first consider the following

problem.

*uppose that we lie to compute the factorial function defined as below:

+$5T-2I$6 (1)>1

+$5T-2I$6 (3)>3O+$5T-2I$6 (3%1)

where 3 is a positive integer.

The following +2$35I* program correctly computes the above

function:

42-92$' +$5T-2I$6

F$2I$6# I3T#9#2: 3

F$2I$6# 2#$6: +$5I348T 3

5$66 +*(3!+$5)

$71 -8T48T +$5

#37

*82-8TI3# +*(I3T#9#2: I!2#$6:2)

F$2I$6# I3T#9#2:'

6$#6 61

I+ I 6# 1 T;#3 9T- 61'>I%1

EE


78/103

5$66 +*('!2)

$72 61 I+ I 6# 1 T;#3 2>1.L #6*# 2>2OI

#3*

We have now denoted two addresses: $71 and $72 . $71 is the return

address for 5$66 +* in the main program and $72 is the return

address for 5$66 +* within itself. Imagine that the first 5$66 +* is

e"ecuted. In this case! $71 is stored indirectly as the return address.

Then! within this subroutine! we have a 5$66 +* instruction. This

time! $72 is the return address. *ince there is only one memory location

prepared for the return address of *ubroutine +*! $72 will necessarily

override $71 ! in some sense. This is quite undesirable because we now

that the control has to go eventually bac to $71 .

'any programming languages! such as +-2T2$3! solve the above

mentioned problem by simply prohibiting a subroutine from calling itself.

$ programming language which allows a subroutine to call itself is called

a recursive programming language. $69-6! 6I*4 and 4$*5$6 are all

famous recursive programming language. We would lie to emphasiDe

here that a programming language can be easily made to be recursive. In

fact! the authors have used many +-2T2$3 compilers which can handle

recursive calls.

6et us now see why our program does compute the factorial if our

compiler is appropriately implemented. Imagine that

EJ


79/103

E


80/103

3 is equal to . The following actions should tae place:

(1)+* is first called! with 3 being equal to and +$5 unnown. The

return address is $71 .

(&)Inside +*! I is now set to be . *ince it is not less than or equal to

1!' is set to be &. +* is called again and the return address is set to be

$72 . It is important to note that I will be equal to . To simplify the

discussion! let us denote this return address as $721 .

()$gain! inside +*! I is now equal to &! ' is set to be 1 and +* is

called. The return address is set to be $72 with I equal to &. To simplify

the discussion! let us now call this return address as $722 .

(0)+inally! inside +*! I is equal to 1. 5ontrol goes to $72 and 2 is

computed to be equal to 1.L.

(/)5ontrol then goes to $722 . *ince I is equal to &! 2 is computed to

be equal to &.

(=)5ontrol then goes to $721 . *ince I is equal to and 2 is equal to &!

2 is computed to be equal to .

(E)The value of +$5 now becomes = and control goes to $71 .

There is another problem in recursive program language compiler

design. In a non%recursive language compiler! a variable is assigned a

fi"ed memory location. +or a recursive language compiler! this obviously

can not be done. 5onsider the variable I within +*. When the first time

the subroutine is called! I is equal to. This value has to be saved in some

JL


81/103

way because we are going to use it later when control comes bac to $72

. If I is assigned to a fi"ed location! we shall have serious problems.

To facilitate these recursive language properties! we may use a stac.

7uring the run time! each time a subroutine is called! an activation record

containing arguments and the appropriate return address is pushed into a

stac. To build a lin between the calling instruction and the called

subroutine! a global variable pointing to the current activation record may

be used. That is! this global variable changes each time a subroutine is

called. $ll of the return addresses and parameters can be found by using

this global variable. 6et us now denote this global variable 582$5T

(current activation record). We shall also use another global variable

T-4*T$5 to denote the top of the stac.

5onsider the above factorial program with the input 3 equal to .

When the first

5$66 +*

is e"ecuted! an activation record will be pushed into the stac as shown

below:

J1


82/103

T-4*T$5

+* $ctivation

582$5T 2ecord 7ynamic 6in

5alling2outine

$ctivation

2ecord

We shall note that the activation record may easily be of different

lengths. Therefore! we shall store the stac as a lined list! so that we can

pop it out later. The lin to pointing to the calling procedureRs activation

record is called the dynamic lin.

What should be stored in the activation recordM #vidently! the

following must be included:

(1)The return address.

(&)The parameters which should be passed.

()The local variables.

(0)The dynamic lin.

We may arrange an activation record as follows:

local variables

4arameters

2eturn address

7ynamic lin

J&


83/103

+or instance! when the program is e"ecuted! the activation record will

appear as below:

T-4*T$5 +$5(>M)

3(>)-perating *ystem

entry point

582$5T 7ynamic lin

3ote the difference between a non%recursive language compiler and a

recursive language compiler. In a recursive language compiler! the

variables are assigned to locations in an activation record which is pushed

into a stac. $s will be e"plained later! every time a subroutine call is

made! it is necessary to put all of its variables into an activation record

and pushed into a stac.

*uppose we have an activation record as follows:

T-4*T$5 H

G

2eturn address

582$5T 7ynamic lin

6et us consider the statement:

H>GK

J


84/103

We shall not use the following ind of instructions any more:

67 G

$77

*T H

Instead! we shall do the following:

67 582$5T(&)

$77 582$5T()

*T 582$5T(0)

In other words! there is a table relating G! and H to the offsets from the

address 582$5T. This information may be easily stored in the identifier

Table.

When the first +* is called! a new activation record is pushed into the

stac as below:

'(>M)

2 +$5(>M)

I 3(>)

$71 -perating system entry point

7ynamic lin 7ynamic lin

+or this new activation record! e"planations are in order:

(1)7ynamic lin! $71 ! I and 2 are put into the activation record before

the

J0


85/103

J/


86/103

V* +*

instruction.

(&)' is put into the activation record by the codes inside the

subroutine.

()*ince we assume that call by address is used! I and 2 are pointers

pointing to 3 and +$5 respectively.

The following instructions can be used to construct the new activation

record and ,ump to the subroutine +*. (The reader has to remember that

before the new activation record is constructed! 5825$T and

T-4*T$5 all refer to the old activation record.)

I35 T-4*T$5

67 T-4*T$5

*T T#'4 (The present T-4T$5 is temporarily stored.)67 582$5T

*TI T-4*T$5 (The dynamic lin is now put into the new activation record.)

I35 T-4*T$5

67 $71

*TI T-4*T$5 ( $71 is put into the new activation record.)

I35 T-4*T$5

67 582$5T

$77 TW-

*TI T-4*T$5(I is put into the new activation record and it point to 3.)

I35 T-4*T$5

J=


87/103

67 582$5T

$77 T;2##

*TI T-4*T$5(2 is put into the new activation record and it points to +$5.)

67 T#'4

*T 582$5T(The new 582$5T is now established.)

V* +* (Vump to +*).

3ote that this new activation record is only partially constructed before

the

V* +*

instruction because there is no way for the code generator to now the

e"act siDe of the new activation record. The ,ob of constructing this new

activation record will be completed within *ubroutine +*. In other

words! within +*! there will be instruction which put Fariable ' into

the new activation record.

$fter the new activation record is constructed! whenever a variable is

mentioned! the code generator uses its offset with respect to 582$5T.

+or instance! within +*! the instruction

'>I%1

will be translated into the following codes:

67 582$5T(&)

*8 -3#

*T 582$5T(0)

JE


88/103

+inally! the stac will appear as follows:

' '(>1) '(>&)

2 2 2 +$5(>M)

I I I 3(>)

$72 $72 $71 -perating system entry point

7ynamic lin

0 & 1

$t this point! no more subroutine calls are necessary and control goes

to $72 . 2 is computed to be equal to 1. $ctivation 2ecord 0 is popped

out. I is pointing to & as registered in $ctivation 2ecord . $ctivation

2ecord is pooped out with 2 computed to be equal to &. 5ontrol goes to

$72 with 2 computed to be equal to = and +$5 is set to be equal to =

because of an indirect store for 2. $ctivation 2ecord & is now popped

out. 5ontrol goes to $71 and +$5 is printed out as equal to =.

We would lie to emphasiDe here that +$5 has to be permanently

changed after the subroutine calls. Therefore call by address must be

used here.

JJ


89/103

*ection /. Interpreter

$ compiler translates a high%level language into a low%level language

to be e"ecuted later by the target machine. There are many cases where

we do not want such a thorough translation because we can translate the

source program into some intermediate code and directly e"ecute this

intermediate code. In such cases! we are not compiling we are merely

interpreting. $fter compiling! we have a machine%language ob,ect

program! while we do not have that after interpreting. #verytime we

want to rum this program! we have to interpret it again.

Interpreting can be e"plained by considering an e"ample. 5onsider the

following high%level instruction:

G>K8OF

The intermediate code for the above instruction may consist of the

following two quadruples:

(O!8!F!T1)

(K!!T1!G)

$ compiler will e"amine the above quadruples and then generate

assembly language instructions as follows:

67 8

'4 F

*T T1

J


90/103

3ote that this is the only thing that a compiler does. It does not bother to

have these instructions e"ecuted.

$n interpreter will not generate these assembly language instructions.

It is more straightforward it simply e"ecutes the first quadruple by

causing 8 to be multiplied by F and storing the result in T1. 6ater! it

e"amines the second quadruple and will immediately e"ecute it by adding

to T1 and storing the result into G.

*ince there is no ob,ect program generated! an interpreter requires much

less space. This is why in many computers where memory space is

limited! interpreters! instead of compilers! are used. The most famous

language which uses the interpreter concept is $*I5. We began to see

$*I5 compilers coming out only recently.

When one is debugging a program ! it will be vary useful if we have an

interpretive mode incorporated into the compiler. That is! we may give

the user a choice between generating an ob,ect code program and

interpreting it . This will give the user a quic way to find out whether

his program is correct or not. $fter he is satisfied with his program! he

can then request that his program be compiled.

L


91/103

*ection /.0 The 2elationship between a 5ompiler and the -perating

*ystem

8p to now! we have carefully avoided the issue of compiling

inputPoutput instructions. To translate an inputPoutput instruction into

machine dependent assembly language code! one can not avoid using the

software facilities provided by the operating system. +or instance! we

never write our own inputPoutput drivers unless it is absolutely necessary

to do so ! because we can and should use the drivers provided by the

operating system. esides! we usually can not sip the inputPoutput

system provided by the operating system even if we want to do so. This

is due to the fact that no one can arbitrarily write data onto diss! as these

actions may cause calamities. 3ote that there are system programs as

well as other usersR data on the dis! and your writing action may destroy

them totally and crash the system.

+urther more! a compiler may allow a user to use subroutines already

defined and stored in the system. It may also allow the use of overlay

facilities. In both cases! a compiler writer has to now quite a lot of the

operating system under which this compiler is going to wor.

In conclusion! there is no way! in practice! to separate a compiler from

the operating system. In this chapter! we have not touched those

problems because these issues can not be easily discussed and the

techniques used are usually not considered to be compiler writing

techniques.

1


92/103

&


93/103

*ection = The Implementation Techniques.

We have discussed compiler writing techniques. We now as a crucial

question: What language should we use to write compilersM *hould we

use machine languages or some high%level languageM

-f course! it is much easier to use a high%level language. The only

reason for using some low%level language is that it produces a more

efficient compiler. ut the problem of maintaining such a compiler can

be quite serious.

There are several problems arising from using high%level languages to

write compilers. 6et us imagine that we want to write a 4$*5$6

compiler and we have concluded that 4$*5$6 is the ideal language to be

used to write this compiler. ;ere is the dilemma. We do not have a

4$*5$6 compiler yet. ;ow can we use 4$*5$6 to write any programM

To solve this problem! we may use some language available to us to

write a very small compiler which will wor for a small subset of

4$*5$6 instructions. We shall call the language comprising this

4$*5$6 subset 4$*5$61 . 8sing 4$*5$61 ! we can write a compiler for

a larger subset of instructions in 4$*5$6. Thus a 4$*5$6 2 compiler is

now produced. We may repeat this process until the entire set of

4$*5$6 instructions are included in some compiler and this compiler is

a 4$*5$6 compiler.


94/103

6et us consider another problem. *uppose we want to write a

4$*5$6 compiler for 42I'# &/L. et! for some reason! we lie to use a

F$G machine. -ne reason may be that there is a very good 4$*5$6

compiler on the F$G machine which means that we can use the e"cellent4$*5$6 language to write our compiler. $nother possible reason is that

the 42I'# &/L has not arrived yet.

If the 42I'# &/L is available to us! we can use it to test the

correctness of our compiler. That is !we can always e"ecute the ob,ect

code program on the 42I'# &/L. If the 42I'# &/L is not available! we

must have an emulator of the 42I'# &/L on the F$G machine. We can

use this emulator to test our compiler ob,ect program.

$fter the correctness of our compiled is established! we still have to

transport it to the 42I'# &/L. ;ow do we do thatM 2emember that this

compiler is written in 4$*5$6 which does not e"ist in the 42I'# &/L

machine. What we do is to compiler our own compiled. The result is a

42I'# &/L a

Documents

Francis Compiler