Upload
david-mihai
View
219
Download
0
Embed Size (px)
Citation preview
8/9/2019 Francis Compiler
1/103
Introduction to Compiler Writing
R. C. T. Lee
Institute Of Computer and Decision Sciences
National Tsing Hua Universit
Hsinc!u" Tai#an
Repu$lic Of C!ina
C. W. S!en
%irst International Computer Corporation
Taipei" Tai#an
Repu$lic Of C!ina
S. C. C!ang
Department Of &lectrical &ngineering and Computer Sciences
Nort!#estern Universit
&vanston" Illinois
U. S. '.
8/9/2019 Francis Compiler
2/103
'(STR'CT
This chapter introduces compiler writing techniques. It was written
with the following observations made by the authors:
(1) While compiler writing may be done without too much nowledge
of formal language or automata theory! most boos on compiler writing
devote most of their chapters on these topics. This inevitably gives most
readers a false impression that compiler writing is a difficult tas. In fact!
it is our observation that may people many easily develop some inferior
comple" out or reading these difficult boos.
We have deliberately avoided this formal approach. #ach aspect of
compiler writing is presented in such an informal way that any naive
reader can finish reading this chapter within! say five or si"! casual hours.
$s a matter of fact! a lot of readers finished reading this chapter before
they went to sleep. They almost treated this chapter as a bed%time novel.
(&) There are several different aspects of compiler writing. 'ost
boos would give detailed presentations on every aspect and still fail to
give the reader some feeling on how to write a compiler. We decided to
use a different approach we shall give an anatomy of a simple compiler.
'ost important of all! the internal data structure of this simple compiler
will be clearly presented. #"amples will be given and a reader will be
able to understand how a compiler wors by studying these e"amples. In
&
8/9/2019 Francis Compiler
3/103
other words! we mae sure that a reader will not lose his global view of
compiler writing he sees not only trees! but also forests.
8/9/2019 Francis Compiler
4/103
*ection 1. Introduction.
$ compiler is a software which translates a high%level language
program into a low%level language program which can be e"ecuted by a
computer. In many compilers! the low%level language is the machine
language. +or our purpose! we assume that our compiler translates a
high%level language into an assembly language simply because it is easier
for us to comprehend assembly languages.
It is our e"perience that a student! while learning compiler writing
technique! has to spend a ma,or part of his time studying an assembly
language. This is rather unfortunate because the code generation part of a
compiler is quite straightforward. -ne ,ust has to remember many
details. y spending too much time in learning assembly languages! a
student loses his precious time to study the other important techniques in
compiler writing.
To mae sure that a student would not have to spend too much time
in studying assembly languages! we decided to use a very simple
assembly language. The assembler language is described in *ection /. $s
the reader can see! it is very easy to learn .
0
8/9/2019 Francis Compiler
5/103
et! it contains enough instructions that it will not be difficult for a
student to translate a high%level language into this assembly language.
If a student is ased to write a compiler with our assembly language as
its target language! he will not be bogged down by the details of the
assembly language. Instead! he can concentrate all his mind to study the
other important compiler writing sills.
*o far as the high%level language is concerned! it is the same story we
do not want to use a very complicated high%level language! such as
+-2T2$3! or 4$*5$6. We do not want to use a subset of +-2T2$3!
either for various reasons which will become clear later. We therefore
designed a high%level language! called +2$35I*! for our compiler
writing purpose. +2$35I* is sufficiently complicated that writing a
compiler for it is a non%trivial ,ob. *till! +2$35I* is ideal for compiler
writing because we deliberately designed it for this purpose. The reader
will gradually appreciate this point.
#ssentially! +2$3I5* is similar to +-2T2$3! e"cept that it contains
some special features from 4$*5$6. It contains most of the well%nown
+-2T2$3 instructions! such as 7I'#3*I-3! *82-8TI3#! 9- T-
and so on. *o far as the I+ statement is concerned! we have an I+.
T;# . #6*# type of statement! a typical feature borrowed from
4$*5$6. ;owever! to simplify the synta" analysis discussion! we do not
allow I+ inside I+. $ll of the variables have to be declared. $gain! a
feature borrowed from 4$*5$6. We shall first assume that +2$35I* is
/
8/9/2019 Francis Compiler
6/103
not a recursive language in the sense that subroutines in +2$35I* can
not call themselves. $fter maing sure that students understand the basic
concept of compiler writing! we then add the recursive feature and also
allow the I+ statement to contain I+ statements. y that time! it should
not be difficult at all to understand how these can be done.
The reader may as: Why is the language called +2$35I*. Well! we
name it +2$35I* in memory of the great saint! *t. +rancis of $ssisi.
This chapter was written when the world was facing an energy crisis.
$ lot of people were quite worried because they might face a dry summer
(not enough gasoline for their cars) and a cold winter (not enough heating
oil for their homes). The fact that millions of homeless people having not
enough food to eat and not enough water to drin did not seem to bother
them.
We believe that the world does not lac energy. What we need is more
warmth and more love. $s the saying goes!
8/9/2019 Francis Compiler
7/103
: > ? @ A B C
#nglish words enclosed by B and C are elementary constructs in
+2$35I*! The symbol ::> means BletterC@BletterC?BdigitCA.
In +2$35I* reserved wors are used. These reserved words can not
be used as ordinary identifiers. They are all underlined in the definition.
+or instance! the following e"pression describes the synta" rules of array
declaration instructions:
Barray declaration part ::> 7I'#3*I-3 array declaration +or the
definition of array definition! consult the appendi".
E
8/9/2019 Francis Compiler
8/103
*ection &. $ 9limpse of 5ompilers.
efore getting into the details of a compiler! let us consider a very
simple program and see what the compiler should do. The program is as
follows:
F$2I$6# I3T#9#2: G! ! H
G > E
>
H >G %
#37
+or the first instruction declaring G! and H to be integers! the
compiler will generate the following assembly language instructions:
G 7* 1
7* 1
H 7* 1
+or the instruction
G > E
the compiler will create a constant! say called Il! and assembly language
codes as follows:
J
8/9/2019 Francis Compiler
9/103
8/9/2019 Francis Compiler
10/103
I1 75 E
67 I1
*T G
*imilarly! for instruction
>
the compiler will generate the following codes:
I& 75
67 I&
*T
+inally! for the instruction
H > G K
the compiler will generate
67 G
$7
*T T1
67 T1
*T H
The reader may wonder why the above sequence of codes are so
inefficient. They should simply be
67 G$7
*T H
In fact! we deliberately showed the codes which are not
efficient for reasons that will become clear later.
8ltimately! the high%level language program will be translated into the
following assembly language codes:
1L
8/9/2019 Francis Compiler
11/103
G 7* 1
7* 1
H 7* 1
I1 75 E
I& 75
67 I1
*T G
67 I&
*T
67 G
$7
*T T1
67 T1
*T H
This is what a compiler would do. ut! how does it do itM
asically! a compiler is divided into three parts: the le"ical analyDer!
the synta" analyDer and the code generator! as described below:
;igh level 6#GI5$6 *3T$G 5-7#
language programs $3$6H#2 $3$6H#2 9#3#2$T-2
fig. &.1
(1) The input of the le"ical analyDer is a sequence of characters
representing a high%level language program. The ma,or function of the
le"ical analyDer is to divide this sequence of characters into a sequence of
11
8/9/2019 Francis Compiler
12/103
toens. +or instance! consider the instruction.
F$2I$6# I3T#9#2: GN N H
The output of the le"ical analyDer isF$2I$6#
I3T#9#2
:
G
N
N
H
(&) The input of the synta" analyDer is a sequence of toens which is
the output of the le"ical analyDer. The synta" analyDer divides the toens
into instructions. +or each instruction! using the (grammatical rules)
already stored! the synta" analyDer determines whether this instruction is
grammatically correct. If not! error messages will be printed out.
-therwise! this instruction is decomposed into a sequence of basic
instructions and these instructions are fed into the code generator to
produce assembly language codes.
+or instance! consider the instruction
G>K8OF
The synta" analyDer would determine that this instruction is indeed
correct and later produce three basic instructions as follows
T1>8OF
T&>KT1
G>T&
() The code generator accepts the output from the synta" analyDer.
1&
8/9/2019 Francis Compiler
13/103
#very basic instruction is now translated into a sequence of assembly
language instructions. This can be easily done because the code generator
can e"amine the pattern of the basic instruction and determine which
sequence of codes it must generate.
To mae sure that the code generator can recogniDe the pattern easily!
we may assume that each basic instruction produced by the synta"
analyDer is of a standard from. In our case! we may assume that the
standard form is a quadruple as follows:
(operator! -perand1! -perand& ! 2esult).
+or instance! for basic instruction
G>KHN
its standard form will be
(KN N HN G)
+or the instruction
G>%HN
The standard form will be
(% N N HN G)
The code generator generates assembly language codes by e"amining the
operator! operands and result.
1
8/9/2019 Francis Compiler
14/103
*ection . The 6e"ical $nalyDer
$s we indicated before! the function of the le"ical analyDer is to group
the input sequence of the characters into toens. This is the ma,or
function. $ctually! as the reader will see! the le"ical analyDer does more
than identifying toens.
*ection .1 The Identification of Toens.
*o far as the language +2$35I* is concerned! the identification of
toens is trivial. 3ote that toens in +2$35I* are separated by
delimiters! such as
8/9/2019 Francis Compiler
15/103
1/
8/9/2019 Francis Compiler
16/103
3ote that for some other high%level languages! it is by no means easy to
identify toens. $ typical e"ample is +-2T2$3. In +ortran!
8/9/2019 Francis Compiler
17/103
7elimiters
1
& (
)0 >
/ K
= %
E O
J P
↑
1L Q
11 R
1& :Table 1
*ection .&. The Types of Toens 2ecogniDed by the 6e"ical $nalyDer.
We have shown how the le"ical analyDer identifies toens. In this
section! We shall illustrate some other functions of the le"ical analyDer.
3ote that the ma,or ,ob of the le"ical analyDer is to prepare everything
for the synta" analyDer. Imagine that the synta" analyDer encounters the
following instruction:
F$2I$6# I3T#9#2 G! ! H
In this case! the synta" analyDer recogniDes that this instruction is a
declaration instruction because it detects the toen SF$2I$6#. In
+2$35I*! the word SF$2I$6# has its particular meaning and can
not be used in any other way. +or instance! one can not write the
following instruction:
1E
8/9/2019 Francis Compiler
18/103
VARIABLE = VARIABLE + 1;
The word SF$2I$6# is a reserved word for +2$35I*. *imilarly!
the words SI+ ! ST;#3 ! S#6*# and so on are all reserved words in
+2$35I* which can not be used as ordinary variables.
It is appropriate to point out here that in +-2T2$3! there is no such a
reserved word concept. Indeed! one may have
I+ > I+ K 1
in +-2T2$3 and this is absolutely not allowed in +2$35I*.
*ince the recognition of reserved words is so important for the synta"
analyDer! it is appropriate for the le"ical analyDer to do the ,ob for it.
Table & contains all of the reserved words used in +2$35I*. *ince this
table is also1. $37&. --6#$3. 5$660. 7I'#3*I-3/. #6*#
=. #34E. #3*J. #U. 9#1L. 9T11. 9T-1&. I+1. I348T10. I3T#9#2 1/. 6$#61=. 6#1E. 6T1J. 3#
1. -2 &L. -8T48T&1. 42-92$'&&. 2#$6&. *82-8TI3#&0. T;#3&/. F$2I$6#
Table & (2eserved Word Table)
1J
8/9/2019 Francis Compiler
19/103
searched very frequently! we may use hashing to store and retrieve data
for this table.
In addition to recogniDing reserved words! we may also recogniDe
numerical constants in the le"ical analyDer phase. This can be easily done.
If a toen starts with a number! it must either be a real number or an
integer. It is easy to differentiate these two because a real number
contains the S..
In summary! for any toen! the le"ical analyDer determines whether it
is a delimiter! a real number! an integer! a reserved word or an identifier.
We! of course! have never defined identifiers. $n identifier is a toen
which is not a delimiter! a real number! an integer or a reserved word. In
general! an identifier usually denotes a variable! an array! a label! the title
of a program or the title of a subroutine.
*ection .. The 2epresentation of Toens.
We now as the question: ;ow does the le"ical analyDer let the synta"
analyDer now what the type of a toen is .
efore answering this question! let us note another problem. -ur
toens are of different lengths. +or internal representation! this is not an
ideal situation.
3ote that a toen! after the le"ical analysis is e"ecuted! must belong to
one of the following types:
1
8/9/2019 Francis Compiler
20/103
1. 7elimiters.
&. 2eserved words.
. Integers.
0. 2eal numbers.
/. Identifiers.
#ach class of toens has a table associated with it. We have already
shown that delimiters are contained in Table 1 and reserved words are
contained in Table &. We now discuss why the other toens also need
tables.
Whenever a toen is identified as an integer! we put this integer into
Table . This is necessary because finally! the code generator is going to
use this table to generate constants. +or instance! consider the following
instructions
G > /
> E
$fter the e"ecution of the le"ical analysis phase! Table will contain two
integers as below:
/
E
.
.
.
*imilarly! we shall designate Table 0 to contain real numbers.
+or each identifier! we shall put it into some location in Table /. the
Identifier Table. The Identifier Table is designed to contain three entries:
&L
8/9/2019 Francis Compiler
21/103
the subroutine to which the identifier belongs! the type of the identifier
and a pointer pointing to some other tables. These will be e"plained in
detail later. 'eanwhile! we merely have to note the following:
(1) #very identifier is hashed into Table /.
(&) If an identifier appears more than once in the same subroutine! this
identifier is put into Table / only once.
() If the same identifier appears in different subroutines! it will
occupy different locations in Table /.
We shall use the second column entry of Table / to point to the subroutine
in which identifier appears.
6et us consider the following e"ample.
42-92$' '$I3
F$2I$6# I3T#9#2: 8! F! W! H
8 > /=
F > E
W > 8 K F
5$66 $1(W! 1=! H)
#34
*82-8TI3# $1(I3T#9#2: G! ! H)
H > G K&.E O
#3*
&1
8/9/2019 Francis Compiler
22/103
In the above e"ample! there are two subroutines: '$I3 and $1 (The
main program is also considered to be a subroutine.). The identifiers
which occur in the main program are
'$I3
8
F
W
H
and the identifiers occurring in the subroutine $1 are
$1
G
H.
3ote that the identifier H occurs in both '$I3 and $1. It will appear
twice in Table /.
$fter the e"ecution of the le"ical analyDer! Table / may loo lie the
table shown below:Identifier *ubroutine Type 4ointer
1& W '$I30 H
/ F = G JEJ $1 8 J
1L J11 H
Table /
&&
8/9/2019 Francis Compiler
23/103
5onsider the second entry of the above table. This entry contains
Identifier W which points to the third location of Table /. *ince the third
location of Table / contains '$I3! this identifier W appears in
*ubroutine '$I3. *imilarly! in the tenth location! we have Identifier pointing to the eighth location of this table. *ince the eighth location
corresponds to *ubroutine $1! appears in *ubroutine $1.
We have shown that each toen! after the le"ical analysis phase! is
found to be associated with a unique location of one of the tables from
Table 1 to Table /. The meaning of each table is summariDed as follows:
1. 7elimiter Table
&. 2eserved Word Table
. Integer Table
0. 2eal 3umber Table
/. Identifier Table.
We may say that each toen is characteriDed by two numbers i and , if
it occupies the ,th location of the ith table. Indeed! to solve the problem
that toens are of different lengths! we may simply use this &%tuple (i !
,)to represent this toen. 3ote that this representation is one%to%one in the
sense that given ( i ! , )! we can uniquely identify the toen. +or instance!
for the above e"ample! the &%tuple (/!&) denotes Identifier W and (/!1L)
denotes .
6et us now give a complete e"ample to illustrate our idea by
considering the following programs:
42-92$' '$I3
F$2I$6# I3T#9#2: 8! F! '
&
8/9/2019 Francis Compiler
24/103
8 > /
F >E
5$66 *1(8! F! ')
#34
*82-8TI3# *1(I3T#9#2: G! ! ')
' > G K K&.E
#3*
$fter the e"ecution of the le"ical analyDer! Table will be as
follows:
1 /
& E
Table (Integer Table)
The 2eal 3umber Table will contain only one element as below:
1 &.E
Table 0 (2eal 3umber Table)
6et us assume that after identifiers are put into Table / Table /. loos
as follows:
Identifier *ubroutine Type 4ointer
1 8
&
'$I3
&0
8/9/2019 Francis Compiler
25/103
0 1L
/ F
= '
EJ G 1L
' 1L
1L *1
Table / (Identifier Table)
The entire program will now be represented as shown below:
42-92$' '$I3
(&!&1) (/!)(1!1)
F$2I$6# I3T#9#2: 8 ! F ! '
(&!&/) (&!10) (1!1&) (/!1) (1!11) (/!/) (1!11) (/!=) (1!1)
8 > /
(/!1) (1!0) (!1) (1!1)
F > E
(/!/) (1!0) (!&) (1!1)
5$66 *1 ( 8 ! F ! ' )
(&!) (/!1L) (1!&) (/!1) (1!11) (/!/) (1!11) (/!=) (1!) (1!1)#34
(&!=) (1!1)
*84-8TI3# *1 ( I3T#9#2 : G ! ! ' )
(&!&) (/!1L) (1!&) (&!10) (1!1&) (/!J) (1!11) (/!0) (1!11) (/!) (1!) (1!1)
' > G K K &.E
(/!) (1!0) (/!J) (1!/) (/!0) (1!/) (0!1) (1!1)
#3*
(&!E) (1!1)
&/
8/9/2019 Francis Compiler
26/103
*ection 0. The *ynta" $nalyDer
The functions of the synta" analyDer are as follows:
(1) The synta" analyDer divides the input toens into instructions.
(&) +or each instruction! the synta" analyDer determines whether this
instruction is grammatically correct.
() If the instruction is not grammatically correct! it is re,ected and
some error message is printed out. -therwise! it is decomposed into a
sequence of basic instructions each basic instruction is in a standard
form. In our case! we shall use quadruples to represent basic instructions.
(0) *ome other information obtained by the synta" analyDer will be put
into the various tables of the compiler.
The first ,ob of the synta" analyDer is to divide the input toen stream
into instructions. This is easy for +2$35I* because in +2$35I*! every
instruction is terminated by the delimiter S. We would lie to point out
that this is not the case in +-2T2$3! for instance! because +-2T2$3
assumes that an instruction is contained in one card unless in the ne"t
card! a symbol appears in the =th column. This maes the ,ob of
determining the termination of an instruction much harder.
&=
8/9/2019 Francis Compiler
27/103
3ote that even in 4$*5$6! the separation of toens into instructions
is more complicated. +or instance! we do not allow an instruction as
follows:
F$2I$6# I3T#9#2 : G! ! H 2#$6 : 8! F
which is perfectly legal in 4$*5$6. In +2$35I*! we shall have two
instructions instead:
F$2I$6# I3T#9#2 : G! ! H:
F$2I$6# 2#$6 : 8! F
In 4$*5$6! compound statements are allowed. They start with #9
and end with #37. Within #9 and #37! there can be a sequence of
instructions! each of which is terminated with S. The reader should
realiDe that we deliberately designed +2$35I* to have such ind of
synta" rules so that a compiler can be easily written. In +2$35I*! we
only allow an I+ T;#3 #6*# instruction as follows:
I+ 4 $37 U T;#3 G > GK 1 #6*# G>G%1
3ote that the entire statement is only one instruction. It should not be
difficult for the reader to modify his +2$35I* compiler to handle an
instruction as follows:
I+ 4 $37 U T;#3 #9 G > G K 1 > G K & #37
#6*# #9 G > G K & > G K 6 #37
&E
8/9/2019 Francis Compiler
28/103
*ection 0.1. The 2ecognition of 7ifferent Types of Instructions.
In +2$35I*! instructions may be classified into the following
categories:
(1)program heading instructions
(&)subroutine heading instructions
()variable type declaration instructions
(0)label declaration instructions
(/)dimension declaration instructions
(=)I+ T;#3 #6*# instructions
(E)9- T- instructions
(J)assignment instructions
()5$66 instructions
(1L)IP- instructions
#ach type of instruction has its own synta" rules. The synta" analyDer
has to recogniDe each instruction and branch to a subroutine to handle this
instruction. +or instance! consider a 9- T- instruction. This instruction
must start with a reserved word 9T-. This 9T- must be followed by a
label. This label must be of the type of an identifier. That is ! it must start
with a letter and followed by integers! letters or nothing. Then! finally! we
e"pect a S following this label. *ince we have stipulated that every label
must have been declared before it is used! we have to chec whether the
&J
8/9/2019 Francis Compiler
29/103
identifier following 9T- is indeed a declared label or not. If the toen
following 9T- is not a identifier or it was not declared as a label before!
error messages will be printed.
6et us consider another case. Imagine that we have already recogniDed
the first toen to be the reserved word SF$2I$6#. We then chec
whether the ne"t toen is SI3T#9#2! S2#$6 or S--6#$3 If yes!
we go ahead to chec whether S: follows. If no! an error message is
printed. $fter finding S:! we e"pect a sequence of distinct identifiers!
none of which appeared before and each one is followed by S!. +inally!
we e"pect a S to terminate this instruction. +or each identifier appearing
in this instruction! we have to tae some semantic action which will be
e"plained in detail later. If the variables are not distinct! or some variable
is not followed by S! or some variable appeared before! error messages
will be printed. The entire process of analyDing a variable type
declaration instruction can be illustrated in +ig 0.1.
&
8/9/2019 Francis Compiler
30/103
+I9. 0.1
L
8/9/2019 Francis Compiler
31/103
The reader can see that it is absolutely necessary to recogniDe the
type of an instruction being analyDed. 6et us temporarily assume that
none of our instructions starts with a label (We shall discuss the handling
of labeled instructions later.).If this is the case! every instruction in+2$35I*! e"cept the assignment instructions! starts with a reserved
word. +or instance! the program heading instruction starts with the
reserved word 42-92$' and the I+ T;#3 #6*# instructions start with
the reserved word I+. In the following table! Table 0.1 we show how
every instruction! e"cept assignment instruction! starts with a particular
reserved word. If an instruction does not start with any reserved word!
then it must be an assignment instruction.
Instructions 2eserved Word
4rogram heading instructions 42-92$'
*ubroutine heading instructions *82-8TI3#
Fariable type declaration instructions F$2I$6#
6abel declaration instructions 6$#6
7imension declaration instructions 7I'#3*I-3
I+ T;#3 #6*# instructions I+
9- T- instructions 9T-
$ssignment instructions """
5all instructions 5$66
IP- instructions read I348T
IP- instructions went -8T48T
Table 0.1.
1
8/9/2019 Francis Compiler
32/103
$gain! we lie to point out here that in +-2T2$3! we can not
identify the type of an instruction by checing the first toen! because
+-2T2$3 has no reserved words. ;owever! those who are familiar with
5--6 or $*I5 will be able to note that every 5--6 or $*I5instruction begins with a reserved word and this reserved word uniquely
determines the type of the instruction.
2emember that in our compiler! every toen is characteriDed by two
integers. To chec whether a toen is a reserved word or not! we merely
have to chec whether the first integer is & because Table & contains all of
the reserved words.
We assumed that labels did not e"ist in the previous discussions. This
is! of course! not a valid assumption because we do allow labels to appear
in instructions. $ctually! given an instruction! our first ,ob is to chec
whether the beginning toen is a label or not.
3ote that labels were not detected in the le"ical analysis phase they
were merely identified as identifiers. 3evertheless! in +2$35I*! it is
agreed that a label must have been declared before it is used. In other
words! there must be an instruction I declaring label 6 before it is used.
$fter instruction I is processed by the synta" analyDer! as the reader will
see later! the synta" analyDer will put a note in the Identifier Table to
declare that I identifier 6 is a label. This is done by putting an appropriate
entry in the type entry of the Identifier Table. 6ater! whenever we want to
now whether any toen is a label or not! we merely have to chec this
toen in the Identifier Table. If it is indeed a label! the type entry of this
toen in the Identifier Table will declare this fact.
&
8/9/2019 Francis Compiler
33/103
$t the top of the synta" analyDer! the type of each instruction is
determined as follows:
(1)5hec the first toen of an instruction. If this is a reserved word! the
type of this instruction is determined by this reserved word.
(&)If the first toen is not a reserved word! chec whether this toen is
a label. If this is not a label! this instruction must be an assignment
instruction.
()If the first toen is a label! the type of this instruction is determined
by the second toen. If the second toen is a reserved word! then the type
of this instruction is determined by this toen. -therwise! it must be an
assignment instruction.
*ection 0.&. The 4rocessing of the 4rogram ;eading Instruction.
There can be only one program heading instruction in the entire
program and this instruction must appear as the first instruction. The
reserved word corresponding to this instruction is 42-92$'. $fter this
instruction is processed! the heading of this program will now be
classified as an identifier and the into the Identifier Table. In this
Identifier Table! we do the following:
8/9/2019 Francis Compiler
34/103
(1)In the subroutine entry of the 4rogram heading in the Identifier Table!
we have nothing. 6ater! whenever we notice that nothing appears in the
subroutine entry of an identifier table! we now that this identifier is the
name of a subroutine. 3ote that the main program can also be consideredas a subroutine.
(&)In the pointer entry! we put a S1 there. We do this to indicate that
the quadruples corresponding to this program start from the first location
of the Uuadruple Table. The Uuadruple Table is now illustrated in +ig.
0.&. +or
4ointer
*1 V
' p l
* 6
*&
Identifier Table Uuadruple Table
+ig. 0.&
.
.
.
.
.
.
.
.
.
.
.
.
.
'ain 4rogram ' p
*ubroutine *1
*ubroutine *&
*ubroutine *
l
V
6
0
8/9/2019 Francis Compiler
35/103
every subroutine! a sequence of quadruples are generated and stored
consecutively in the Uuadruple Table. In the Identifier Table! we have a
pointer for every subroutine pointing to the starting point of the
corresponding sequence of quadruples.
*ection 0.. The 4rocessing of variable Type 7eclaration Instructions.
The reserved word to identify a variable type declaration instruction
is F$2I$6#. $fter processing such an instruction! the following actions
will be taen:
(1)6et G1,X &,……,Gn be variables declared in this instruction. In
the subroutine entry of Gi in the Identifier Table! put a pointer pointing to
the subroutine it appears in. That is! suppose G i belongs to *ubroutine *
which appears in 6ocation 6s in the Identifier Table! then put 6s in the
subroutine entry corresponding to Gi in the Identifier Table.
(&) 4ut a code indicating the type of Gi in the type entry of Gi in the
Identifier Table. We may arbitrarily set the codes as follows:
$rray 1
oolean &
5haracter
Integer 0
6abel /
2eal =
/
8/9/2019 Francis Compiler
36/103
If the synta" analyDer finds out that the type of Gi is boolean! S& will be
put into the type entry of Gi in the Identifier Table.
()+or each variable Gi! a quadruple is generated in the Uuadruple
Table. This quadruple will be used by the code generator later to assign
the necessary space. 6et the location in the Identifier Table occupied by
Gi be 6i. Then the quadruple corresponding to Gi is ((/!6i)! ! ! ).
6et us imagine that a certain variable G is declared as a real variable
and the location occupied by this variable is 1/ in the Identifier Table.
Then the quadruple corresponding to this variable is ((/!1/)! ! ! !). In the
type entsy! we have a S=! indicating that it is a real variable. The code
generator! after noting the &%tuple (/!1/)! will e"amine the 1/th location
of the Identifier Table. It will notice the e"istence of S= in the type entry
and thus will generate the following assembly language codes:
G 7* &
because a real number variable will occupy two memory locations.
*ection 0.0. The 4rocessing of 7imension Instructions.
=
8/9/2019 Francis Compiler
37/103
$ dimension declaration instruction declares arrays. +or every array
declared! we have to record the type of the array! the dimensionality of
the array and the siDe of the array in each dimension. +or instance!
suppose the siDe of the array in each dimension. +or instance! suppose the
siDe of a real array is declared as (1=!/). We now have to record the fact
that the array is real! the dimensionality of this array is &! the siDe of the
first dimension is 1=! and the siDe of the second dimension is /. To store
this information! we need a new table! called Information Table. We shall
denote is Table E. Table E is a one%dimensional table . In the above case!
the information will be stored as follows:
.
.=
&
1=
/
.
.
.
Information Table
In the Identifier Table! a pointer will be made to point to the Information
Table as follows:
X2eal $rray
X7imensionality
XThe siDe of the first dimension
XThe siDe of the second dimension
E
8/9/2019 Francis Compiler
38/103
*ubroutine
$ 1 &1
Identifier Table
6et us summariDe the actions taen by the synta" analyDer to handle a
dimension declaration instruction as follows:
(1)*uppose $ is declared as an array. In the subroutine entry of $ in
the Identifier Table! 4ut an appropriate pointer to indicate the subroutine
it belongs to.
(&)In the type entry of $ in the Identifier Table! put S1 into it!
indicating that $ is an array.
()*uppose that the type code of an array is ,! the dimensionality of
$ is i and the siDes of $ corresponding to different dimensions of $ are
*1, *& , *i respectively. 4ut *1, *&, *&,… ...*i into the
Informations Table (Table E). *uppose! in the Information Table ! the ne"t
available location is 3! then the information ,. i,*1,*&,….. ,*i will
be put into locations 3! 3K1! . ! 3KiKl respectively. 4ut 3 into the
pointer entry of $ in the Identifier Table.
(0)In the Uuadruple Table ! generate a quadruple in which the first
element identifies the location occupied by $rray $ in the Identifier
Information Table
&1
&&
&
&0
J
8/9/2019 Francis Compiler
39/103
Table.
8/9/2019 Francis Compiler
40/103
That is ! if the array occupies the /th location of the Identifier Table!
then the quadruple generated will be
((/!/)! ! ! )
The information Table will not only contain information concerning
arrays. $s we shall see later !when we want to store other types of
information! such as the information concerning parameters of a
subroutine! we can also use this table. We may view the Information
Table as a warehouse where many people can store things. The only thing
that we have to do is to eep pointers pointing to the location where
information is stored.
The reader may wonder why we donRt eep separate tables for
different types of information. +or instance! why donRt we eep on table
to store information concerning arrays and another table for information
concerning subroutinesM 3ote that if we eep several tables! we have to
reserve quite a lot of memory locations for these tables. That is! we have
to eep one array for each table. *ince we do not now how large the siDe
of each table should be! we are forced to eep many rather large arrays
and later we may find out that many arrays are largely empty and a lot of
memory space is wasted. y eeping one information table! we have
decreased the degree of memory space wasting.
0L
8/9/2019 Francis Compiler
41/103
There is another important point to be made here. *uppose we write the
compiler in +-2T2$3. We certainly can not have an array which will
store integers as well as characters. The Information Table! Table E!
contains only integers. The this is possible is due to the clever design tomae sure that every piece of information is represented by integers.
*ection 0./ The 4rocessing of 6abel Instructions.
6abel instructions are easy to identify they all start with the reserved
word 6$#6. $fter recogniDing a label instruction! the following actions
will be taen.
(1)$rrange a pointer to be put into the subroutine entry of this label in
the Identifier Table.
(&)In the type entry! put
8/9/2019 Francis Compiler
42/103
.
.
.
6$#6 61LL
.
.
.
61LL G>KH
In this case! the following events must tae place:
(1)The le"ical analyDer recogniDes 61LL as an identifier and hashes it
into the Identifier Table.
(&)The synta" analyDer recogniDes the reserved word 6$#6 and
consequently decides that Identifier 61LL must be a label. It then
indicates so by putting
8/9/2019 Francis Compiler
43/103
6ater! when the code generator encounters this quadruple! it will realiDe
that the ne"t assembly language codes ne"t generated should be labeled.
+or
G>KHthe assembly language codes are
67
$7 H
*T G
ecause this instructions is labeled! the code generator will generate a
special label! say 6! in this case. Thus we may have
6 67 $7 H
*T G
Imagine that we later encounter an instruction as
9T- 61LL
In this case! the synta" analyDer checs the pointer entry of 61LL in the
Identifier Table and notices that the starting quadruple corresponding to
the instruction labeled by 61LL starts from location 3 in the Uuadruple
Table. It will therefore generate a quadruple e"pressing
8/9/2019 Francis Compiler
44/103
We hope that the reader can appreciate the rule that we do not use
integers to represent labels! as is done in many languages. In +2$35I*!
every integer recogniDed in the le"ical analysis phase is indeed a number
and it is therefore appropriate to put it into the Integer Table. If we also
used numbers to represent labels! then we have to do some synta"
analysis even in the le"ical analyDer which means that the le"ical
analyDer and the synta" analyDer can not be separated clearly.
*ection 0.E The 4rocessing of $ssignment Instructions.
We are now going to discuss the most interesting and important part of
the synta" analyDer: the processing of assignment instructions. $s we
noted before! assignment instructions are the only instruction which do
not start with a reserved word and must contain an < sign.
In the following discussion! we shall first assume that the assignment
instructions contain arithmetic e"pressions only. The method that we
present can be easily e"tended to handle assignment instructions with
logical e"pressions. esides! let us temporarily assume that arrays do not
appear in the assignment instructions.
5onsider the instruction
G>KH
00
8/9/2019 Francis Compiler
45/103
It is very easy for the synta" analyDer to recogniDe the operator
8/9/2019 Francis Compiler
46/103
(*,U,V,T1)
(K!!T1!G)
where T1 is a temporary variable.
-ur question is :;ow does the computer now that we should first
e"ecute multiplication and then addition?
This problem can be solved by converting an assignment instruction
into its 2everse 4olish 3otation. To obtain the 2everse 4olish 3otation
of an assignment instruction involving arithmetic operations only! we
shall first assume that there is an ordering of operators as follows:
O!P
K!%
( !)
>
The procedure to generate the 2everse 4olish 3otation for an assignment
instruction is as follows:
(1)The symbols of the source string move left towards the ,unction(*ee
+ig. 0.).
(&)When an operand (In our case! an operand is a variable.) arrives! it
passes straight through the ,unction.
()When an operator (In our case! an operator is a delimiter) arrives! it
pauses at the ,unction. If the stac is empty! this incoming operator slides
down into the stac.
0=
8/9/2019 Francis Compiler
47/103
-therwise! it loos at the top of the stac. While the priority of the
incoming operator is not greater than that of the operator at the top of the
stac! the operators in the stac will pop out and move to the left of the
,unction one by one. $fter the above process is finished! the incomingoperator is pushed into the stac.
(0)(always goes down to the stac.
(/)When) reaches the ,unction! it causes all of the operators in the stac
bac as far as (to be shunted out and then both parentheses disappear.
(=)When all symbols have been moved to the left. pop out all of the
operators in the stac to the left ,unction.
6et us consider the case of
G>K8OF
The steps of obtaining the 2everse 4olish 3otation of the above
instruction is illustrated in +ig. 0..
G>K8OF G! > K8OF
(*T$5)
(a) (b)
G! K8OF G! K 8OF
> >
(c) (d)
0E
8/9/2019 Francis Compiler
48/103
G! 8OF G!!8 O F
K K
> >
(e) (f)
G!!8! F G!!8!F
O O
K K
> >
(g) (h)
G!!8!F!O!K! G!!8!F!O!K!>
>
(i) (,)
+ig. 0.
The transformed 2everse 4olish 3otation is thus
G!!8!F!O!K!>.
$fter transforming an assignment instruction into its 2everse 4olish
3otation! one can decompose it into a sequence of basic instructions very
easily. The procedure is as follows: (We assume that every operator is a
binary operator here. It is easy to e"tend this method to handle unary
operators.)
*tep 1. 6et i>1.
*tep &. 4ic the first operator from left to right. 6et this operator be Oi .
0J
8/9/2019 Francis Compiler
49/103
*tep . *can bac and pic up two operands immediately preceding Oi .
6et these two operands be F1 and F2 . 9enerate a basic
instruction Oi iT V V =)!( &1 .
*tep 0. If the last symbol has already been scanned! stop. -therwise! let
i>iK1 and go to *tep &.
6et us consider the assignment instruction.
G>K8OF
again. The 2everse 4olish 3otation of the above instruction is
G!!8!F!O!K!>.
$ sequence of basic instructions are now generated:
asic Instruction Uuadruple
G!!8!F!O!K!> T1>8OF (O!8!F!T1)
G!!T1!K!> T&>KT1 (K!!T1!T&)
G!T&!> G>T& (>!T&! !G)
6et us consider another e"ample:
G>(K8)OF
The 2everse 4olish 3otation of the above instruction is obtain as follows:
G>(K8)OF G! >(K8)OF
(a) (b)
0
8/9/2019 Francis Compiler
50/103
G! (K8)OF G! ( K8)OF
> >
(c) (d)
G! K8)OF G! K 8)OF
( (
> >
(e) (f)
G!! 8)OF G!!8! ) OF
K K
( (
> >
(g) (h)
G!!8!K! OF G!!8!K O F
> >
(i) (,)
G!!8!K F *!G!8!K!F
O O
> >
() (l)
G!!8!K!F!O G!!8!K!F!O!>
>
(m) (n)
+ig. 0.0
/L
8/9/2019 Francis Compiler
51/103
The 2everse 4olish 3otation for
G>(K8)OF
is thus
G!!8!K!F!O!>
$ sequence of basic instructions are generated as follows:
asic Instructions Uuadruples
G!!8!K!F!O!> T1>K8 (K!!8!T1)
G!T1!F!O!> T&>T1OF (O!!T1!F!T&)
G!T&!> G>T& (>!T&! !G)
In the previous discussion! we have only dealt with assignment
instructions with arithmetic operations. In +2$35I*! we allow logical
assignments as well. That is! we may have an instruction as
G>4 $37 U -2 2
where 4!U and 2 are boolean variables. It should be easy for the reader to
see that the above instruction can be transformed into its 2everse 4olish
3otation as follows:
G!4!U! $37! 2! -2!>
if we agree that $37 should be performed before -2. The basic
instructions are generated as below:
/1
8/9/2019 Francis Compiler
52/103
asic Instructions Uuadruples
G!4!U!$37!2!-2!> T1>4 $37 U ($37!4!U!T1)
G!T1!2!-2!> T&>T1 -2 2 (-2!T1!2!T&)
G!T&!> G>T& (>!T&! !G)
*ection 0.J The 4rocessing of Instructions 5ontaining $rrays.
In the previous discussions! we assumed that arrays did not e"ist. If
they do e"ist. If they do e"ist! we have to tae special action. 5onsider
the following instruction:
$(I)>(&!)K0
In this case! the parameters appear because of the arrays $ and and they
must be eliminated first.
3ote that arrays must have been declared before they are used. 6et us
assume that $ and appear in the Eth and the 1=th locations in the
Identifier Table respectively. Then $ is represented by (/!E) and is
represented by (/!1L). In the Identifier Table! both of these identifiers
will be indicated to be arrays. The synta" analyDer! when it encounters
the symbol (/!E)! will now e"amine the Eth location of Table / and will
find out that this identifier represents a array. Immediately! it will branch
to a subroutine to handle this array.
6et us consider a simple e"ample first:
G>(I!V)K0
/&
8/9/2019 Francis Compiler
53/103
In this case! we want to replace (I!V) by a temporary variable. 6et us
assume that $rray was defined as follows:
7I'#3*I-3 I3T#9#2:(!)
The traditional method of memory management will arrange the array as
follows:
(1!1)
(&!1)
(!1)
(1!&)
(&!&)
(!&)
(1!)(&!)
(!)
That is ! (I!V) will be the ((V%1)'KI)th location of the array if the
dimensionalities of are ' and 3 respectively. Thus! to replace (I!V)!
we have to generate the following quadruples:
(%!V!1!T1) T1>V%1
(O!T1!'!T&) T&>T1O'
(K!T&!I!T) T>T&KI
+inally! we need the following quadruple:
(>!!T!T0)!
/
8/9/2019 Francis Compiler
54/103
which is interpreted as
T0>(T).
The assembly language codes corresponding to
(>!!T!T0)
are
67 KT%1
*T T0
The entire sequence of quadruples for
G>(I!V)K0
are(%!V!1!T1) T1>V%1
(O!T1!'!T&) T&>T1O'
(K!T&!I!T) T>T&KI
(>!!T!T0) T0>(T)
(K!T0!0!T/) T/>T0K0
(>!T/! !G) G>T/
6et us consider an e"ample in which an array appears at the left side of
the S> sign:
$(I)>(I!V)
In this case! (I!V) is replaced by a temporary variable the same way as
we discussed before. *uppose the temporary variable is T0. We then
have the situation as follows:
$(I)>T0
/0
8/9/2019 Francis Compiler
55/103
The reader can see that we have three inds of quadruples where the
operators are all S>:
(>! !G) G>
(>!$!I!G) G>$(I)(>!G!$!I) $(I)>G.
That the 5ode 9enerator will not be confused is due to the fact that $ is
a declared array while G and are not.
We discussed how to handle the &%dimensional integer arrays. It
should be easy for the reader to e"tend the above idea to three! or four
dimensional arrays. It should not be difficult to handle real arrays either.
*ection 0.. The 4rocessing of I+ T;#3 #6*# Instructions.
The I+ T;#3 #6*# instructions can be easily recogniDed because they
all start with the reserved word I+. $ general format of this ind of
instruction is
I+ # T;#3 I31 #6*# I3 2
where # is a logical e"pression and I31 and I3 2 are two simple
instructions.
The synta" analyDer! after analyDing the above instruction! would
generate the following quadruple:
(I+!4!U41 !U42 )
//
8/9/2019 Francis Compiler
56/103
where 4 is a logical variable! evaluated to 1 or L! depending on the
e"pression # and the values of its variable during the run time and U41 is
the starting quadruple corresponding to I31 . This quadruple will be
interpreted as SIf 4 is true! e"ecute the codes corresponding to U41
otherwise! e"ecute codes corresponding to U42 . *uppose the quadruple
corresponding to I31 starts from the 1=th location of the Uuadruple Table
(Table=)! then U41 will be represented by (=!1=).
6et us illustrate these ideas by an e"ample. 5onsider
I+ 4 T;#3 G>GK1 #6*# G>GK&
The synta" analyDer will generate the following quadruples! assuming
that the quadruples will start from the 1/th location of the Uuadruple
Table.
1/ (I+!4!(=!1=)!(=!1))
1= (K!G!1!T1)
G>GK1
1E (>!T1! !G)
1J (9T-! ! !(=!&1))
1 (K!G!&!T&)
G>GK&
&L (>!T&! !G)
When the first quadruple of the above sequence is generated! the synta"
analyDer nows that U41 will start from the 1=th location! but it does not
/=
8/9/2019 Francis Compiler
57/103
now where U42 will start. Therefore! the synta" analyDer will have to
wait for the generation of the 1th quadruple. When this quadruple is
generated! the synta" analyDer comes bac to Uuadruple 1/ and fills in
(=!1). The situation is the same for Uuadruple 1J. When this quadruple
is generated! the synta" analyDer does not now how long the sequence of
quadruples corresponding to G>GK& will be. It therefore waits until all
of the quadruples corresponding to G>GK& are generated.
6et us consider a more complicated case:
I+ 4 $37 U T;#3 G>GK1 #6*# G>GK&
In this case! we first have to evaluate 4 and U! assuming that 4 and U
are boolean variables. The quadruples for this instruction with be as
follows:
1/ ($37!4!U!T1Y 1= (I+!T1!(=!1E)!(=!&L))
1E (K!G!1!T&)
1J (>!T& !G)
1 (9T-! ! !(=!&&))
&L (K!G!&!T)
&1 (>!T! !G)
Uuadruple 1/ means that the value of T1 is the $37 of 4 and U. the
assembly language codes for this quadruple are:
1!7 4
$37 U
*T T
/E
8/9/2019 Francis Compiler
58/103
In general, a logical expression can be as coplica!e" as #ollo$s%
G 9T -2 6T H $37 8 #U F.
To evaluate the above e"pression! we may use the 2everse 4olish
3otation to transform it into the following form (We have assumed that
we should evaluate $37 operations before -2 operations.):
G!!9T!!H!6T!8!F!#U!$37!-2
The sequence of quadruples generated will be
(9T!G!!T1)
(6T!!H!T&)
(#U!8!F!T)($37!T&!T!T0)
(-2!T1!T0!T/)
*ection 0.1L. The 4rocessing of 9o T- Instructions.
It is very simple to process 9- T- instructions. +or9T- 6G
the synta" analyDer first has to find out whether 6G is a label or not. This
can be done by checing through the Identifier Table. If it has not been
declared as a label or it was declared to be something else! we produce an
error message. If 6G is indeed a label! we chec whether this instruction
with the label has appeared before. If it appeared! we may generate the
following quadruple:
(9T-! ! !U 4G )
where U4G is the starting quadruple corresponding to the instruction
labeled with 6G. -therwise! we have to wait until the synta" analyDer
encounters the instruction with the label. If this instruction never appears
/J
8/9/2019 Francis Compiler
59/103
or it appears more than once! we also produce error messages.
*ection 0.11 The 4rocessing of 5$66 Instructions.
When a 5$66 instruction is encountered! the synta" analyDer will have
to record the name of the subroutine being called and the arguments
appearing in the instruction. To do this! we again use the Information
Table.
$ 5$66 instruction is of the following form:
5$66 *1( 4 4 4 31 2! !....! )
The quadruple corresponding to the above instruction will be of the
following form:
/
8/9/2019 Francis Compiler
60/103
.
.
.
.(5$66! *1! !(E!')) ' 3 The number of parameters
4ointer pointing to 41
. .
. .
. .
.
4ointer pointing to 4 3
.
.
Information Table (Table E)
*ince we store identifiers and numbers in tables! we have to use two
numbers to represent each 4i . 6et us assume that our 5$66 instruction is
5$66 *1(W!1=!$!/E!)
*uppose that *1! W and $ occupy the =th! 1/th and &Eth locations of the
Identifier Table (Table /) respectively. *uppose 1= occupies the rd
location of the Integer Table (Table)! and /E. occupies the &nd location
of the 2eal Table (Table 0). 3ote that the reserved word 5$66 occupies
the rd location of the 2eserved Word Table (Table &). 6et us assume
that the ne"t available location in the Information Table is /. The
quadruple corresponding to the above instruction will be as follows:
=L
8/9/2019 Francis Compiler
61/103
.
.
.
.((&!)(/!=)! !(E!/)) / 0 3umber of parameters/ W1/
1=/&E $
0& /E.
Table E
*ection 0.1&. #"amples.
In this section we shall present two e"amples designed to show how
quadruples are generated and how the various tables will loo after the
synta" analyDer is e"ecuted.
#"ample 0.1
42-92$' $1
F$2I$6# I3T#9#2:G!!I
7I'#3*I-3 I3T#9#2:$(1&)
6$#6 61! 6&
I>1
G>/>11
61 I+ (G 9T ) T;#3 (9T- 6&) #6*# G>GK&
$(I)>G
I>IK1
9T- 61
6& #34
=1
8/9/2019 Francis Compiler
62/103
*82-8TI3# T4# 4-I3T#2
1
& I / 0
0
/ $1 1 (Uuadruple Table)
=
E $ / 1 1 (Information Table)
J G / 0
1L
11 / 0
1&
1
10 61 / / 1L (Uuadruple Table)
1/ 6& / / 1J (Uuadruple Table)
Identifier Table (Table/)
1 1& 1 0
& 1 & 1 $rray $
/ 1&
0 11
/ & Information Table (Table E)
Integer Table (Table)
=&
8/9/2019 Francis Compiler
63/103
1 ((/!J)! ! ! ) G
& ((/!11)! ! ! )
((/!&)! ! ! ) I
0 ((/!E)! ! ! ) $/ ((/!10)! ! ! ) 61
= ((/!1/)! ! ! ) 6&
E ((1!0)!(!&)! !(/!&)) I>1
J ((1!0)!(!)! !(/!&)) G>/
((1!0)!(!0)! !(/!J)) >11
1L ((&!1L)!(/!J)!(/!11)!(L!1)) T1>G 9T
11 ((&!1&)!(L!1)!(=!1&)!(=!1)) I+ T1 9- T- 1&! #6*# 9- T- 1
1& ((&!11)! ! !(=!1J)) 9T- 6&
1 ((1!/)!(/!J)!(!/)!(L!&)) T&>GK&
10 ((1!0)!(L!&)! !(/!J)) G>T&
1/ ((1!0)!(/!J)!(/!E)!(/!&)) $(I)>G
1= ((1!/)!(/!&)!(!&)!(/!&)) I>IK1
1E ((&!11)! ! !(=!1L)) 9T- 61
1J ((&!=)! ! ! ) 6"
Uuadruple Table (Table =)
#"ample 0.&.
42-92$' $&
F$2I$6# I3T#9#2: I!V!7I'#3*I-3 I3T#9#2: $(&L)!(0!/)
I>&
V>
5$66 $(I!V!)
$()>(I!V)K&.E
#34
*82-8TI3# $(I3T#9#2:G!!)
F$2I$6# I3T#9#2:H
H>=>(G%H)O&K
#3*
=
8/9/2019 Francis Compiler
64/103
*ubroutine Type 4ointer Type 4ointer
1
& G 11 0
I = 00
/ $ = 1 1 (Information Table)
= $& 1 (Uuadruple Table)
E = 0
J 11 0
= 1 0 (Information Table)
1L V = 0
11 $ 1= (Uuadruple Table)
1&1 11 0
10
1/ H 11 0
1=
Identifier Table (Table /)
1 0
& 1 $rray $
&L
0 0/ & $rray 1 &L 1 &.E
= 0 & 0
E / /
J 0 & 2eal Table
/ I / (Table 0)
1L = =
11 / V $(I!V!) E 1
1& 1L
1 / Integer Table
10 E (Table )
Information Table
(Table E)
=0
8/9/2019 Francis Compiler
65/103
Uuadruples:
1 ((/!)! ! ! ) I
& ((/!1L)! ! ! ) V
((/!E)! ! ! ! )
0 ((/!/)! ! ! ! ) $
/ ((/!)! ! ! ! )
= ((1!0)!(!0)! !(/!)) I>&
E ((1!0)!(!/)! !(/!1L)) V> 4rogram $&
J ((&!)!(/!11)!!(E!J)) 5$66 $(I!V!)
((1!=)!(/!1L)!(!E)!(L!1)) T1>V%1
1L ((1!E)!(L!1)!(!&)!(L!&)) T&>T1O0
11 ((1!/)!(/!)!(L!&)!(L!)) T>IKT&
1& ((1!0)!(/!)!(L!)!(L!0)) T0>(T)
1 ((1!/)!(L!0)!(0!1)!(L!/)) T/>T0K&.E
10 ((1!0)!(L!/)!(/!/)!(/!E)) $()>T/
1/ ((&!=)! ! ! ) #34
1= ((/!&)! ! ! ) G
1E ((/!1)! ! ! )
1J ((/!J)! ! ! )
1 ((/!1/)! ! ! ) H
&L ((1!0)!(!=)! !(/!1/)) H>=
&1 ((1!=)!(/!&)!(/!1/)!(L!=)) T=>G%H *ubroutine $
&& ((1!)!(L!=)!(!0)!(L!E)) TE>T=K&
& ((1!/)!(L!E)!(/!1)!(L!J)) TJ>TEK
&0 ((1!0)!(L!J)! !(/!J)) >TJ
&/ ((&!E)! ! ! ) #3*
=/
8/9/2019 Francis Compiler
66/103
*ection /. 5ode 9eneration
The function of a code generator of a compiler is to generate assembly
or machine language code. 5onceptually! this is the easiest phase of the
compiler writing. In practice! it is always quite difficult to write because
it is heavily machine%dependent. -ne must be able to master the target
assembly language! the operating system! the linage editor! the file
system and many other features of the target machine. In order to
simplify our discussion! we assume that we are implementing our
compiler on a very simple machine whose assembly language instructions
are shown below:
$77 add
$37 and operation
$29 argument
3 branch if negative
4 branch if positive
H branch if Dero
75 define constant
7IF divide
7-* define storage
#G4 e"ponential
#37 end
I35 increase by one
V'4 unconditional ,ump
==
8/9/2019 Francis Compiler
67/103
&'B p !o sbro!ine
67 load
67I load%indirect
'4 multiply
3-4 no operation
-2 or operation
2T3 return
*T store
*TI store%indirect
*8 subtract
GI input
G-2 e"clusive of operation
G4 output
We shall first assume that our language is non%recursive.
The code generation of recursive language will be discussed later. 6et
us consider #"ample 0.&. the code generator will first e"amine Table
and 0 and generate the following instructions:
I1 75 &L
I& 75 0
I 75 /
I0 75 &
I/ 75
I= 75 =
IE 75 1
21 75 &.E
=E
8/9/2019 Francis Compiler
68/103
Then the code generator will e"amine the quadruples one by one and
generate a sequence of instructions for each quadruple. +or instance!
consider the first quadruple. This quadruple is
((/!)! ! ! )$fter e"amining this quadruple! the code generator will find out that this
quadruple calls for the following instruction:
I 7* 1
because the third element in Table / corresponds to an identifier I which
is an integer occupying one word of memory space.
5onsider Uuadruple 0. This quadruple is
((/!/)! ! ! )
We shall first find out that (/!/) corresponds to an array! as indicated by
its type in Table /. The other characteristics of this array can be found by
its pointer pointing to Table E. In Table E! we will note that $rray $ is a
one%dimensional array and occupies twenty memory locations. Thus! the
code generator will generate the following assembly language instruction:
$ 7* &L
*imilarly! the assembly language instruction for
=J
8/9/2019 Francis Compiler
69/103
Uuadruple
((/!)! ! ! )
will be
7* &L
5onsider Uuadruple =! which is
((1!0)!(!0)! !(/!)).
(1!0) will be found to correspond to S>! (!0) will be found to correspond
to S& and (/!) will be found to correspond to SI. Thus this quadruple
will be translated into the following assembly language instructions:
67 I0
*T I
similarly! Uuadruple 11 will be translated into
67 I
$77 T&
*T T
=
8/9/2019 Francis Compiler
70/103
*ection /.1 The 5ode 9eneration for *ubroutine 5alls
6et us consider Uuadruple J of #"ample 0.&:
((&!)!(/!11)! !(E!J)).
The first &%tuple corresponds to 5$66 and (/!11) corresponds to $.
Thus the first assembly language instruction corresponding to the above
quadruple will be
V* $.
$ subroutine call will usually invoe the passing of arguments. The
arguments of this ,ump%to%subroutine instruction can be found by
e"amining the last element of the quadruple. *ince the last elements is
(E!J)! we e"amine the Jth element of Table E. This will inform us that
this calling instruction involves three parameters! namely I! V! and .
There are many methods of passing the arguments. 6et us introduce
the simplest ind first.
'ethod 1%%5all by $ddress
In this method! after the
V* $
instruction! we shall have a sequence of instructions to store the addresses
of arguments which are to be passed. Thus the instructions will be as
follows:
EL
8/9/2019 Francis Compiler
71/103
&'B A
$29 I
$29 V
$29
$fter these instructions! an instruction corresponding to
T1>V%1
will follow. That is! we shall have a sequence of assembly language
instructions as follows:
V* $
61 $29 I
6& $29 V6 $29
60 67 V
*8 -3#
*T T1
When the program is finally e"ecuted! the addresses of I!V and will be
stored in 61! 6& and 6 respectively.
6et us now tae a loo as to how the arguments are passed to the
called subroutine $! in this case. $t the very beginning of the
instructions corresponding to $! we shall have a define%storage
instruction
$ 7* 1
When the calling instruction
E1
8/9/2019 Francis Compiler
72/103
&'B A
is e"ecuted! the machine will store the address immediately after the V*
instruction! namely 61! into the location $. This action serves to build a
lin between a calling instruction and the called subroutine. $s will
become clear later! the arguments as well as the return address are all
passed through this mechanism.
3ote that in #"ample 0.&! we have to pass parameters I!V! and to G!
and (These three variables are local ones.) respectively. To pass I to G!
we may use the following instructions:
67I $
*T G
The load%indirect instruction stores 61 into the register. *ince the
address of I is stored in 61! the store instruction puts the address of I into
location G.
The other parameters can be transferred in a similar way +or instance!
the transferring of V to will be as follows:
I35 $
67I $
*T
To transfer the last argument! we do the following:
I35 $
67I $
*T
E&
8/9/2019 Francis Compiler
73/103
$t the end of the code corresponding to the subroutine $! there should
be a return instruction:
I35 $
2T3 $
3ote that the final $ point to 60 and control will eventually return to 60!
as e"pected.
It is important to note that for every parameter inside a subroutine!
which is not an ordinary variable! its address! not its value! is stored.
Thus every time that the subroutine refers to a parameter! it should use an
indirect command so that the command will actually be referring to the
variable in the calling procedure. +or e"ample! if is a parameter! then
the statement
>K1
is translated into
67I
$77 -3#
*TI
Through this way! the argument is changed in the calling program
which is desired.
E
8/9/2019 Francis Compiler
74/103
'ethod &.%%5all by Falue
The above method of passing parameters during subroutine calls is
called Scall by address because the address of a variable is used. *ince
the address is passed! it may sometimes cause trouble. +or instance! let
us consider the calling of $$771(G) again.
*82-8TI3# $771(I:Integer)
I>IK1
2#T823
#37
There is no problem if we have the following instructions:
G>&
5$66 $771(G)
-8T48T G
In this case! S will be printed out.
*uppose we have the following program:
5$66 $771(&)
V>&
-8T48T V
In this case! after
5$66 $771(&)
is e"ecuted! the constant S& will become S. Therefore
E0
8/9/2019 Francis Compiler
75/103
V will not be set to &. Instead! it will become ! which may be very
baffling to many naive users.
In contrast to Scall by address! Scall by value avoids the above
problem because only the temporary values of the parameters are passed.
Thus the changes to the parameters will be local within the subroutine.
They will not have any lasting effect.
To implement the Scall by value method! we shall store the values!
instead of the addresses! of the parameters! immediately after the ,ump%
to%subroutine instruction. +or instance! in the above case! we shall have
V* $
61 75 I
6& 75 V
6 75
Within the subroutine ! we may use the load%indirect instruction to
transfer the parameters to local variables. $ll instructions in the
subroutine would then refer to these local variables.
$ctually! if the language uses call by address! it is easy to simulate call
by value by e"plicitly having the subroutine copy the arguments into local
variables. The subroutine would then only refer to these copies! and
never to the original variables.
-ne simple way to protect the variables from the undesirable effect
of call by address is to simply rename the variables. +or e"ample! we
may do the following:
E/
8/9/2019 Francis Compiler
76/103
>G
5$66 $771()
or the subroutine may use a local variable with the statement
>G
as the first statement and all GRs replaced by Rs in the program body.
Then the calling programs would not need >G and G is not changed by
the subroutine call.
In the following! we shall assume that call by address is used unless
stated otherwise.
E=
8/9/2019 Francis Compiler
77/103
*ection /.& 5ode 9eneration for 2ecursive 6anguages
*o far we have assumed that within a *ubroutine *! there is no calling
of *. If a subroutine calls itself! what will happen? To mae sure that
the reader understands this problem! let us first consider the following
problem.
*uppose that we lie to compute the factorial function defined as below:
+$5T-2I$6 (1)>1
+$5T-2I$6 (3)>3O+$5T-2I$6 (3%1)
where 3 is a positive integer.
The following +2$35I* program correctly computes the above
function:
42-92$' +$5T-2I$6
F$2I$6# I3T#9#2: 3
F$2I$6# 2#$6: +$5I348T 3
5$66 +*(3!+$5)
$71 -8T48T +$5
#37
*82-8TI3# +*(I3T#9#2: I!2#$6:2)
F$2I$6# I3T#9#2:'
6$#6 61
I+ I 6# 1 T;#3 9T- 61'>I%1
EE
8/9/2019 Francis Compiler
78/103
5$66 +*('!2)
$72 61 I+ I 6# 1 T;#3 2>1.L #6*# 2>2OI
#3*
We have now denoted two addresses: $71 and $72 . $71 is the return
address for 5$66 +* in the main program and $72 is the return
address for 5$66 +* within itself. Imagine that the first 5$66 +* is
e"ecuted. In this case! $71 is stored indirectly as the return address.
Then! within this subroutine! we have a 5$66 +* instruction. This
time! $72 is the return address. *ince there is only one memory location
prepared for the return address of *ubroutine +*! $72 will necessarily
override $71 ! in some sense. This is quite undesirable because we now
that the control has to go eventually bac to $71 .
'any programming languages! such as +-2T2$3! solve the above
mentioned problem by simply prohibiting a subroutine from calling itself.
$ programming language which allows a subroutine to call itself is called
a recursive programming language. $69-6! 6I*4 and 4$*5$6 are all
famous recursive programming language. We would lie to emphasiDe
here that a programming language can be easily made to be recursive. In
fact! the authors have used many +-2T2$3 compilers which can handle
recursive calls.
6et us now see why our program does compute the factorial if our
compiler is appropriately implemented. Imagine that
EJ
8/9/2019 Francis Compiler
79/103
E
8/9/2019 Francis Compiler
80/103
3 is equal to . The following actions should tae place:
(1)+* is first called! with 3 being equal to and +$5 unnown. The
return address is $71 .
(&)Inside +*! I is now set to be . *ince it is not less than or equal to
1!' is set to be &. +* is called again and the return address is set to be
$72 . It is important to note that I will be equal to . To simplify the
discussion! let us denote this return address as $721 .
()$gain! inside +*! I is now equal to &! ' is set to be 1 and +* is
called. The return address is set to be $72 with I equal to &. To simplify
the discussion! let us now call this return address as $722 .
(0)+inally! inside +*! I is equal to 1. 5ontrol goes to $72 and 2 is
computed to be equal to 1.L.
(/)5ontrol then goes to $722 . *ince I is equal to &! 2 is computed to
be equal to &.
(=)5ontrol then goes to $721 . *ince I is equal to and 2 is equal to &!
2 is computed to be equal to .
(E)The value of +$5 now becomes = and control goes to $71 .
There is another problem in recursive program language compiler
design. In a non%recursive language compiler! a variable is assigned a
fi"ed memory location. +or a recursive language compiler! this obviously
can not be done. 5onsider the variable I within +*. When the first time
the subroutine is called! I is equal to. This value has to be saved in some
JL
8/9/2019 Francis Compiler
81/103
way because we are going to use it later when control comes bac to $72
. If I is assigned to a fi"ed location! we shall have serious problems.
To facilitate these recursive language properties! we may use a stac.
7uring the run time! each time a subroutine is called! an activation record
containing arguments and the appropriate return address is pushed into a
stac. To build a lin between the calling instruction and the called
subroutine! a global variable pointing to the current activation record may
be used. That is! this global variable changes each time a subroutine is
called. $ll of the return addresses and parameters can be found by using
this global variable. 6et us now denote this global variable 582$5T
(current activation record). We shall also use another global variable
T-4*T$5 to denote the top of the stac.
5onsider the above factorial program with the input 3 equal to .
When the first
5$66 +*
is e"ecuted! an activation record will be pushed into the stac as shown
below:
J1
8/9/2019 Francis Compiler
82/103
T-4*T$5
+* $ctivation
582$5T 2ecord 7ynamic 6in
5alling2outine
$ctivation
2ecord
We shall note that the activation record may easily be of different
lengths. Therefore! we shall store the stac as a lined list! so that we can
pop it out later. The lin to pointing to the calling procedureRs activation
record is called the dynamic lin.
What should be stored in the activation recordM #vidently! the
following must be included:
(1)The return address.
(&)The parameters which should be passed.
()The local variables.
(0)The dynamic lin.
We may arrange an activation record as follows:
local variables
4arameters
2eturn address
7ynamic lin
J&
8/9/2019 Francis Compiler
83/103
+or instance! when the program is e"ecuted! the activation record will
appear as below:
T-4*T$5 +$5(>M)
3(>)-perating *ystem
entry point
582$5T 7ynamic lin
3ote the difference between a non%recursive language compiler and a
recursive language compiler. In a recursive language compiler! the
variables are assigned to locations in an activation record which is pushed
into a stac. $s will be e"plained later! every time a subroutine call is
made! it is necessary to put all of its variables into an activation record
and pushed into a stac.
*uppose we have an activation record as follows:
T-4*T$5 H
G
2eturn address
582$5T 7ynamic lin
6et us consider the statement:
H>GK
J
8/9/2019 Francis Compiler
84/103
We shall not use the following ind of instructions any more:
67 G
$77
*T H
Instead! we shall do the following:
67 582$5T(&)
$77 582$5T()
*T 582$5T(0)
In other words! there is a table relating G! and H to the offsets from the
address 582$5T. This information may be easily stored in the identifier
Table.
When the first +* is called! a new activation record is pushed into the
stac as below:
'(>M)
2 +$5(>M)
I 3(>)
$71 -perating system entry point
7ynamic lin 7ynamic lin
+or this new activation record! e"planations are in order:
(1)7ynamic lin! $71 ! I and 2 are put into the activation record before
the
J0
8/9/2019 Francis Compiler
85/103
J/
8/9/2019 Francis Compiler
86/103
V* +*
instruction.
(&)' is put into the activation record by the codes inside the
subroutine.
()*ince we assume that call by address is used! I and 2 are pointers
pointing to 3 and +$5 respectively.
The following instructions can be used to construct the new activation
record and ,ump to the subroutine +*. (The reader has to remember that
before the new activation record is constructed! 5825$T and
T-4*T$5 all refer to the old activation record.)
I35 T-4*T$5
67 T-4*T$5
*T T#'4 (The present T-4T$5 is temporarily stored.)67 582$5T
*TI T-4*T$5 (The dynamic lin is now put into the new activation record.)
I35 T-4*T$5
67 $71
*TI T-4*T$5 ( $71 is put into the new activation record.)
I35 T-4*T$5
67 582$5T
$77 TW-
*TI T-4*T$5(I is put into the new activation record and it point to 3.)
I35 T-4*T$5
J=
8/9/2019 Francis Compiler
87/103
67 582$5T
$77 T;2##
*TI T-4*T$5(2 is put into the new activation record and it points to +$5.)
67 T#'4
*T 582$5T(The new 582$5T is now established.)
V* +* (Vump to +*).
3ote that this new activation record is only partially constructed before
the
V* +*
instruction because there is no way for the code generator to now the
e"act siDe of the new activation record. The ,ob of constructing this new
activation record will be completed within *ubroutine +*. In other
words! within +*! there will be instruction which put Fariable ' into
the new activation record.
$fter the new activation record is constructed! whenever a variable is
mentioned! the code generator uses its offset with respect to 582$5T.
+or instance! within +*! the instruction
'>I%1
will be translated into the following codes:
67 582$5T(&)
*8 -3#
*T 582$5T(0)
JE
8/9/2019 Francis Compiler
88/103
+inally! the stac will appear as follows:
' '(>1) '(>&)
2 2 2 +$5(>M)
I I I 3(>)
$72 $72 $71 -perating system entry point
7ynamic lin
0 & 1
$t this point! no more subroutine calls are necessary and control goes
to $72 . 2 is computed to be equal to 1. $ctivation 2ecord 0 is popped
out. I is pointing to & as registered in $ctivation 2ecord . $ctivation
2ecord is pooped out with 2 computed to be equal to &. 5ontrol goes to
$72 with 2 computed to be equal to = and +$5 is set to be equal to =
because of an indirect store for 2. $ctivation 2ecord & is now popped
out. 5ontrol goes to $71 and +$5 is printed out as equal to =.
We would lie to emphasiDe here that +$5 has to be permanently
changed after the subroutine calls. Therefore call by address must be
used here.
JJ
8/9/2019 Francis Compiler
89/103
*ection /. Interpreter
$ compiler translates a high%level language into a low%level language
to be e"ecuted later by the target machine. There are many cases where
we do not want such a thorough translation because we can translate the
source program into some intermediate code and directly e"ecute this
intermediate code. In such cases! we are not compiling we are merely
interpreting. $fter compiling! we have a machine%language ob,ect
program! while we do not have that after interpreting. #verytime we
want to rum this program! we have to interpret it again.
Interpreting can be e"plained by considering an e"ample. 5onsider the
following high%level instruction:
G>K8OF
The intermediate code for the above instruction may consist of the
following two quadruples:
(O!8!F!T1)
(K!!T1!G)
$ compiler will e"amine the above quadruples and then generate
assembly language instructions as follows:
67 8
'4 F
*T T1
J
8/9/2019 Francis Compiler
90/103
3ote that this is the only thing that a compiler does. It does not bother to
have these instructions e"ecuted.
$n interpreter will not generate these assembly language instructions.
It is more straightforward it simply e"ecutes the first quadruple by
causing 8 to be multiplied by F and storing the result in T1. 6ater! it
e"amines the second quadruple and will immediately e"ecute it by adding
to T1 and storing the result into G.
*ince there is no ob,ect program generated! an interpreter requires much
less space. This is why in many computers where memory space is
limited! interpreters! instead of compilers! are used. The most famous
language which uses the interpreter concept is $*I5. We began to see
$*I5 compilers coming out only recently.
When one is debugging a program ! it will be vary useful if we have an
interpretive mode incorporated into the compiler. That is! we may give
the user a choice between generating an ob,ect code program and
interpreting it . This will give the user a quic way to find out whether
his program is correct or not. $fter he is satisfied with his program! he
can then request that his program be compiled.
L
8/9/2019 Francis Compiler
91/103
*ection /.0 The 2elationship between a 5ompiler and the -perating
*ystem
8p to now! we have carefully avoided the issue of compiling
inputPoutput instructions. To translate an inputPoutput instruction into
machine dependent assembly language code! one can not avoid using the
software facilities provided by the operating system. +or instance! we
never write our own inputPoutput drivers unless it is absolutely necessary
to do so ! because we can and should use the drivers provided by the
operating system. esides! we usually can not sip the inputPoutput
system provided by the operating system even if we want to do so. This
is due to the fact that no one can arbitrarily write data onto diss! as these
actions may cause calamities. 3ote that there are system programs as
well as other usersR data on the dis! and your writing action may destroy
them totally and crash the system.
+urther more! a compiler may allow a user to use subroutines already
defined and stored in the system. It may also allow the use of overlay
facilities. In both cases! a compiler writer has to now quite a lot of the
operating system under which this compiler is going to wor.
In conclusion! there is no way! in practice! to separate a compiler from
the operating system. In this chapter! we have not touched those
problems because these issues can not be easily discussed and the
techniques used are usually not considered to be compiler writing
techniques.
1
8/9/2019 Francis Compiler
92/103
&
8/9/2019 Francis Compiler
93/103
*ection = The Implementation Techniques.
We have discussed compiler writing techniques. We now as a crucial
question: What language should we use to write compilersM *hould we
use machine languages or some high%level languageM
-f course! it is much easier to use a high%level language. The only
reason for using some low%level language is that it produces a more
efficient compiler. ut the problem of maintaining such a compiler can
be quite serious.
There are several problems arising from using high%level languages to
write compilers. 6et us imagine that we want to write a 4$*5$6
compiler and we have concluded that 4$*5$6 is the ideal language to be
used to write this compiler. ;ere is the dilemma. We do not have a
4$*5$6 compiler yet. ;ow can we use 4$*5$6 to write any programM
To solve this problem! we may use some language available to us to
write a very small compiler which will wor for a small subset of
4$*5$6 instructions. We shall call the language comprising this
4$*5$6 subset 4$*5$61 . 8sing 4$*5$61 ! we can write a compiler for
a larger subset of instructions in 4$*5$6. Thus a 4$*5$6 2 compiler is
now produced. We may repeat this process until the entire set of
4$*5$6 instructions are included in some compiler and this compiler is
a 4$*5$6 compiler.
8/9/2019 Francis Compiler
94/103
6et us consider another problem. *uppose we want to write a
4$*5$6 compiler for 42I'# &/L. et! for some reason! we lie to use a
F$G machine. -ne reason may be that there is a very good 4$*5$6
compiler on the F$G machine which means that we can use the e"cellent4$*5$6 language to write our compiler. $nother possible reason is that
the 42I'# &/L has not arrived yet.
If the 42I'# &/L is available to us! we can use it to test the
correctness of our compiler. That is !we can always e"ecute the ob,ect
code program on the 42I'# &/L. If the 42I'# &/L is not available! we
must have an emulator of the 42I'# &/L on the F$G machine. We can
use this emulator to test our compiler ob,ect program.
$fter the correctness of our compiled is established! we still have to
transport it to the 42I'# &/L. ;ow do we do thatM 2emember that this
compiler is written in 4$*5$6 which does not e"ist in the 42I'# &/L
machine. What we do is to compiler our own compiled. The result is a
42I'# &/L a