View
223
Download
2
Category
Preview:
Citation preview
Index
1 Major phases of a compiler 21.1 Lexical Analysis 21.2 Syntax Analysis 21.3 Semantic Analysis 21.4 Code generation 2
2 The PL/0 Language 42.1 PL/0 52.2 Syntax of PL/0 using EBNF (EBNF) 62.3 Syntax Diagram for PL/0 statement 7
3 Symbol Table of PL/0 144 Convert to reverse polish notation165 Code generated by the Compiler 20
5.1 Reverse Polish Notation 215.2 Code table 225.3 Examples of intermediate code for PL/0 constructs 235.4 Pascal code to generate code for the 245.5 Pascal code to generate code for the 255.6 Example of Lexical Levels 325.7 Use of a Display 34
M A Smith Page 1 December 16, 2009
Overview of the PL/0 Compiler
1 Major phases of a compiler
1.1 Lexical Analysis
Which splits the program up into tokens which are more convenient for laterstages of the compiler to handle.
if, else, variable, =, != // are examples of tokens in Java
1.2 Syntax Analysis
Checks that the formation of the program conforms to the syntax of theprogramming language as specified in the syntax diagrams.
if ( cost < 20 ) cheap++;
The above construct is syntactically valid in Java
if ( cost < 20 ) cheap+++;
whilst the above is syntactically invalid
1.3 Semantic Analysis
Checks that the meaning behind the syntax of the program that is compiledis valid.
amount = cost * VAT;
A semantic check would be that amount,cost and VAT had been declared andthat they where of the correct type for the operation being performed.
1.4 Code generation
Generates code that may be executed. This is realised by either:
• The CPU of a computer• An interpreter which simulates the actions of the
"executable" code.• A mix of CPU execution and interpretation.
Compiler
Source Code
Executable code
M A Smith Page 2 December 16, 2009
Overview of the PL/0 Compiler
Symbol Table
Name type Loc
Spend Int 10Money Int 20
Lexical Analysis
Syntax Analysis
Semantic Analysis
Code Generation
Machine code
Source CodeMoney = Money-Spend;
move.w money,d0sub.w spend,d0move.w d0,money
M A Smith Page 3 December 16, 2009
Overview of the PL/0 Compiler
2 The PL/0 LanguageA toy language
Example program
var countdown;begin countdown := 10; while countdown >= 0 do begin write countdown; // Extension countdown := countdown – 1; endend
Output
109876543210
M A Smith Page 4 December 16, 2009
Overview of the PL/0 Compiler
2.1 PL/0
• A Block structured languagebegin similar to { in Javaend similar to } in Java
• := used instead of == can now be used as equals in a boolean expression
• Only 1 (One) data type intReminiscent of BCPL, and B
• Can only output integerswrite countdown;
Not part of the original language PL/0 for which therewhere no I/O statements.
Conceived in about 1976 by Nicholas Wirth as a toy languageused to illustrate the compiler writing process. He wrote thecompiler for PL/0 in the language Pascal.
M A Smith Page 5 December 16, 2009
Overview of the PL/0 Compiler
2.2 Syntax of PL/0 using EBNF (EBNF)
program = block "."
block = [ "const" ident "=" number {"," ident "=" number} ";"] [ "var " ident {"," ident} ";"] {"procedure" ident ";" block ";" } statement
statement = [ ident ":=" expression | "call" ident | "begin" statement {";" statement } "end" | "if " condition "then" statement | "while" condition "do " statement ]
condition = "odd" expression | expression ("="|"#"|"<"|"<="|">"|">=") expression
expression = [ "+"|"-"] term { ("+"|"-") term}
term = factor { ("*"|"/") factor }
factor = ident | number | "(" expression ")"
number = // What would it be
ident = // What would it be
Elements of EBNF
definition = Is rewritten asconcatenation , No intervening white spacetermination ;separation | oroption [ ... ] 0 or 1 timesrepetition { ... } 0 or more timesgrouping ( ... )double quotation marks " ... " Namesingle quotation marks ' ... ' Name
M A Smith Page 6 December 16, 2009
Overview of the PL/0 Compiler
2.3 Syntax Diagram for PL/0 statement
ident := expression
call ident
begin
statement
end
;
if
condition
then
while do
statement
satement
statement
condition
statement = [ ident ":=" expression | "call" ident | "begin" statement {";" statement } "end" | "if " condition "then" statement | "while" condition "do " statement ]
Treat as if a rail track, and just follow the line through the diagram.
M A Smith Page 7 December 16, 2009
Overview of the PL/0 Compiler
ident := expression
call ident
begin
statement
end
;
if
condition
then
while do
statement
satement
statement
condition
"if" condition "then" statement
"begin" statement {";" statement } "end"
M A Smith Page 8 December 16, 2009
Syntax analysis
typesymbol = (nul,ident,number,plus,minus,times,slash, oddsym,eql, neq, lss, leq, gtr, geq, lparen, rparen, comma, semicolon, period, becomes, beginsym, endsym, ifsym, thensym, whilesym, dosym, callsym, constsym, varsym, procsym);
var ch: char (* last char. read *); sym: symbol (* last symb. read *); id: alfa (* last id. read *); num: integer (* last number read *);
The lexical analyser getsym (lines 87-152) returns in sym the type of the nexttoken read from the pl/0 program
If it is an identifier then "id" will contain the characters of the identifier.If it is a number then num will contain the binary number.If it is a punctuation character then the table "ssym" will be used to find the typeof the punctuation character.
M A Smith Page 9 December 16, 2009
Syntax analysis
Initialisation of tables used by lexical analysis
for ch := chr(0) to chr(127) do ssym[ch] := nul;
word[1] := 'begin '; word[2] := 'call ';word[3] := 'const '; word[4] := 'do ';word[5] := 'end '; word[6] := 'if ';word[7] := 'odd '; word[8] := 'procedur';word[9] := 'then '; word[10] := 'var ';word[11] := 'while ;
wsym[1] := beginsym; wsym[2] := callsym;wsym[3] := constsym; wsym[4] := dosym;wsym[5] := endsym; wsym[6] := ifsym;wsym[7] := oddsym; wsym[8] := procsym;wsym[9] := thensym; wsym[10] := varsym;wsym[11]:= whilesym;
ssym['+'] := plus;ssym['-'] := minus; ssym['*'] := times;ssym['/'] := slash; ssym['('] := lparen;ssym[')'] := rparen; ssym['='] := eql;ssym[','] := comma; ssym['.'] := period;ssym['#'] := neq; ssym['<'] := lss;ssym['>'] := gtr;ssym['$'] := leq; ssym['@'] := geq;ssym[';'] := semicolon;
M A Smith Page 10 December 16, 2009
Syntax analysis
var a,b;
The code required to check if a var declaration in PL/0 is valid would be:
if sym = varsym thenbegin
getsym;while sym = ident dobegin
getsym;if sym in [comma,semicolon] thenbegin
if sym = comma then getsymend else error(5);
end ;if sym = semicolon then getsym else error(5);
end ;
[This code does not attempt any error recovery.]
The actual code used by PL/0 is as follows which attempts to recover from errorsmade by a user in specifying a var declaration.
procedure vardeclaration;begin
if sym = ident then getsymelse error(4)
end ;
if sym = varsym thenbegin
getsym;repeat
vardeclaration;while sym = comma dobegin getsym; vardeclaration end ;if sym = semicolonthen getsym else error(5)
until sym <> ident;end ;
M A Smith Page 11 December 16, 2009
Error Recovery in PL/0
Apart from some minor concessions to error recovery in the construction of thesyntax analysis there is a generalised strategy which is employed on the detectionof an error.
That is, on detection of a syntax error, to skip to a major syntactical unit and thenproceed in the normal way.
type symset = set of symbol;
var declbegsys, statbegsys, facbegsys: symset;
declbegsys := [constsym, varsym, procsym]; statbegsys := [beginsym, callsym, ifsym, whilesym]; facbegsys := [ident, number, lparen];
declbegsys, statbegsys and facbegsys are all used to inform the syntax analyserwhich token to skip to on detection of an error.
For example in the call to block line 649:
block(0, 0, [period] + declbegsys + statbegsys);
The third parameter is the set of symbols, which on detection of an error thesyntax analyser will skip to.
Also in the call to statement line 438
statement([semicolon, endsym] + fsys);
M A Smith Page 12 December 16, 2009
Error Recovery in PL/0
This is usually done with the procedure TEST (lines 154-161)
procedure test(s1, s2: symset; n: integer);
beginif not (sym in s1) thenbegin
error(n); s1 := s1 + s2;while NOT (sym in s1) DO getsym
endend
Which if the current symbol is not in s1 generates error message n and then skipstokens till finds a token in s2 or s1.
M A Smith Page 13 December 16, 2009
Reverse Polish Notation
3 Symbol Table of PL/0The following declarations define the symbol table in PL/0
typealfa = packed array [1.. al] of char;object = (constant, variable, proc);
vartable: array [0.. txmax] of record
name: alfa;case kind: object of
constant: ( val: integer);variable, proc: (level, adr: integer)
end ;
The major routines to manipulate this are
enter lines 183-203
which enters a name, its type and value into the symbol table
position lines 206-214
which finds the position of an identifier in the table If the identifier is not there itreturns 0
M A Smith Page 14 December 16, 2009
Reverse Polish Notation
Semantic analysis is responsible for checking that the meaning of a syntacticallycorrect construct is valid.
The major semantic checks are to check that a variable has been declared and thatit is of the correct type.
Consider the code to process an assignment statement:
begin (* statement *)if sym = identthen begin
i := position(id);if i = 0 then error(11)else if table[i].kind <> variable then begin (* Assignment to non-variable *)
error(12); i := 0 end ;
getsym; if sym = becomes then getsym else error(13); expression(fsys); if i <> 0
then WITH table[i] DO gen(sto, lev - level, adr )
end
The syntax check for an identifier, is followed bya semantic check for:
a) is the variable declared
i := position(id)if i = 0 ....
b) is it of the correct type
if table[i].kind <> variable thenbegin
error(12);
Note in the second case the index into the symbol table is usedto access the kind of the variable.
M A Smith Page 15 December 16, 2009
Reverse Polish Notation
4 Convert to reverse polish notationTo convert from infix notation to reverse polish is as follows:
Output Input
Stack
INPUT consists of an expression in infix notation
while ( true ) begin
get next symbol/token from INPUT
identifiermove to OUTPUT
operatorwhile priority (INPUT op <= op on STACK) dobegin
pop item on STACK and move to OUTPUTendPush operator on INPUT onto STACK
(push '(' on stack
)while op on stack <> '(' dobegin
pop item on STACK and move to OUTPUTenddiscard '(' on top of stack
end of expressionwhile stack not empty dobegin
pop item on STACK and move to OUTPUTendexit
end
Priority of operatorsHigh ** * / + - Low
M A Smith Page 16 December 16, 2009
Reverse Polish Notation
program convertoreversepolish(input,output);
const MAX = 20;
var stack : record items : array[1..MAX] of char; tos : 0 .. MAXend;
(* * Function to return the priority of an operator *)
function priority(c:char):integer;begin
if c in ['*','-','+','/','^','#','('] thencase c of
'+' : priority := 1;'-' : priority := 1;'*' : priority := 2;'/' : priority := 2;'^' : priority := 3;'#' : priority := -1; (* End marker in stack *)'(' : priority := -1;
end else write('Panic: Error in stack ch = ', c );end;
(* * Push an object onto the stack *)
procedure push(c:char);begin
if stack.tos >= MAX thenbegin
writeln('Panic: Stack Full'); halt;end;stack.tos := stack.tos + 1; stack.items[stack.tos] := c;
end;
(* * Pull the top object from the stack *)
M A Smith Page 17 December 16, 2009
Reverse Polish Notation
function pop:char;begin
if stack.tos = 0 thenbegin
writeln('Panic: Stack empty'); halt;end;pop := stack.items[stack.tos]; stack.tos := stack.tos - 1;
end;
(* * Return the top item on the stack [Not removing the top item] *)
function lookattos:char;begin
lookattos := stack.items[stack.tos];end;
procedure initialisestack;begin
stack.tos := 0;end;
(* * Do all the work of converting infix expression to reversepolish *)
M A Smith Page 18 December 16, 2009
Reverse Polish Notation
procedure convertoreversepolish;var junk:char;
c:char;begin
initialisestack;push('#'); (* End of stack marker *)repeat
if eoln then c := '$' else read(c);if c in ['(',')','$','+','-','*', '/','^'] thencase c of'(' :
push('(');')' :
beginwhile not ( lookattos in ['(','#'] ) dobegin
write( pop );end;if lookattos = '(' then
junk := pop { Dispose of '(' on stack }else
writeln(' Error Missing ) ');end;
'+','-','/','*','^' :begin
while priority( c )<=priority( lookattos ) dobegin
write( pop );end;push( c );
end;'$' :
while lookattos <> '#' do write( pop );endelse if c in ['a' .. 'z' ] then
write(c)else begin
writeln(' Panic: Ch ',c,' not valid Abort'); c:='$'end;
until c = '$';readln;writeln;
end;
beginwriteln('Infix to Reverse notation');while not eof dobegin
convertoreversepolish;end;
end .
M A Smith Page 19 December 16, 2009
Intermediate code of PL/0
5 Code generated by the CompilerCode Operation Explanation
LIT a push( literal a );OPR op T1:=pop; T2:=pop; push( T2 op T1 );LOD l,a push( location a in lexical level l );STO l,a Location a in lexical level l := pop;INT a tos := tos + aJMP a pc := aJPC a if ( pop = 0 ) pc := aCALL l,a Call routine at location a lexical level l
In the code the following variables are used
pc Is the program counter pctos Is the top of stack tosa Is the address of the variablel Is the lexical level of the variable
Arguments to opr are
Arg Action Arg Action
0 Return 1 neg2 + 3 -4 * 5 DIV6 ODD 7 Undefined8 = 9 <>10 < 11 >=12 > 13 <=
M A Smith Page 20 December 16, 2009
Intermediate code of PL/0
5.1 Reverse Polish Notation
In this notation the operator comes after the operands
Infix Notation Reverse Polish Notation
A + B A B +
A + B * C A B C * +
(A + B) * (C + D) A B + C D + *
A + B * C + D A B C * + D +
In Reverse Polish notation there is no need to use brackets
The PL/0 intermediate code is in effect a Reverse Polish Instruction set
Thus the code to execute 1 + 2 * 5 is:
LIT 1LIT 2LIT 5OPR *OPR +
In reverse polish notation the expression is: 1 2 5 * +
M A Smith Page 21 December 16, 2009
Code generation by the PL/0 compiler
5.2 Code table
The code table is defined by:
type fct = (lit,opr,lod,sto,cal,int,jmp,jpc) (*functions*);
instruction = PACKED RECORDf: fct (* func. code*);l: 0.. levmax (* level *);a: 0.. amax (* displacement *);
END; VAR cx: integer (* code allocation index *);
code: array [0.. cxmax] of instruction;
Code is entered in to the table with the procedure gen
The parameters to which are:
x: Function codey: Lexical levelz: Offset in the lexical level
procedure gen( x:fct; y,z:integer ); { lines 154-161 }begin
if cx > cxmaxthen begin write(' program too long'); goto 99 end ;with code[cx] do begin f := x; l := y; a := z end ;cx := cx + 1
end (* gen *);
cx pointer to the next free cell in this table
the "goto 99" is a panic when the code table is full
M A Smith Page 22 December 16, 2009
Code generation by the PL/0 compiler
5.3 Examples of intermediate code for PL/0 constructs
if hours > 40 then bonus := 30;
Address Intermediate code
10 LOD hours11 LIT 4012 OPR >
••• 13 JPC 1614 LIT 3015 STO bonus16
while count < 20 dobegin
count := count + 1;end :
Address Intermediate code
10 LOD count11 LIT 2012 OPR <
••• 13 JPC 1914 LOD count15 LIT 116 OPR +17 STO count18 JMP 1019
M A Smith Page 23 December 16, 2009
Code generation by the PL/0 compiler
5.4 Pascal code to generate code for the
"if condition then statement" construct
if sym = ifsymthen
begingetsym; condition([thensym, dosym] + fsys);if sym = thensym then getsym
else error(16);cx1 := cx; gen(jpc, 0, 0); statement(fsys);code[cx1].a := cx
end
The above performs both syntax analysis and code generation for the if statement
cx is the index into the code table giving the nextfree cell for an instruction
After generating code for the condition which will leave the result (true,false) ontop of the stack the address of the next instruction to be generated is rememberedin cx1.
Next the instruction JPC is generated, its address will be contained in cx1. Theoperand of this instruction (where to transfer control to if the condition is falsecan as yet, not be filled in and is set as 0)
Following this the code for the statement is generated by a call to "statement"
Then the JPC instruction can be "patched" to fill in the address to transfer controlto, if the if statement where false (remember cx points to the next freeinstruction)
M A Smith Page 24 December 16, 2009
Code generation by the PL/0 compiler
5.5 Pascal code to generate code for the
"while condition do statement" construct
if sym = whilesymthen
begincx1 := cx; getsym; condition([dosym] + fsys);cx2 := cx; gen(jpc, 0, 0);if sym = dosym then getsym else error(18);statement(fsys); gen(jmp, 0, cx1); code[cx2].a := cx
end ;
The above performs syntax analysis and code generationfor the while statement
cx is the index into the code table giving the nextfree cell for an instruction
Before generating code for the condition which will leave the result (true,false)on top of the stack, the address of the first instruction of the evaluation of thecondition is remembered in cx1.
After a call to the procedure "condition" to generate the code for the conditionthe address of the JPC instruction which will be generated next is remembered incx2.
Next the instruction JPC is generated, its address will be contained in cx2. Theoperand of this instruction (where to transfer control to if the condition is falsecan as yet, not be filled in and is set to 0)
Following this the code for the body of the while statement is generated(Call on procedure "statement").
Then the JMP instruction back to the top of the while loop is generated (the address of which is in cx1).
Then the JPC instruction can be "patched" to fill in the address to transfer controlto if the while statement where false (Remember cx points to the next freeinstruction)
M A Smith Page 25 December 16, 2009
Code generation by the PL/0 compiler
Consider the infix expression:
1 * 2 + 3 * 4
This can be thought of as a tree, with the operators with the highest priority atthe bottom of the tree.
1 2 3 4
+
**
The way of evaluating this tree is to work down the tree till can replace a part ofthe subtree with a result and then to repeat the process till the whole tree isevaluated.
In this example (1*2) could be evaluated, then (3*4)[working left to right] and finally (1*2) + (3*4)
Formally this is evaluating the tree LHS item RHS (Recursively)
The way the syntax diagrams are organised is to reflect this structure
With EXPRESSION, TERM and FACTOR being responsible for parsing apart of the tree:
M A Smith Page 26 December 16, 2009
Code generation by the PL/0 compiler
1 2 3 4
+
**
Expression
Term
Factor
The way an arithmetic expression is generated is to essentially use the proceduresEXPRESSION, TERM and FACTOR to generate a tree which if printed out inthe order (LHS RHS item) will generate the reverse polish formula for the infixexpression.
The main complication is that the tree is not generated as such but is formed bythe recursion of the 3 procedures EXPRESSION TERM and factor.
This means that no tree data structure is actually created, but the recursivedescent process forms the parsing of the tree and the subsequent generation of thecode
M A Smith Page 27 December 16, 2009
Code generation by the PL/0 compiler
Procedure TERM
begin (* term *)factor(fsys + [times, slash]);while sym in [times, slash] do
beginmulop := sym; getsym;factor( fsys + [times, slash]);if mulop = times then gen(opr, 0, 4)
else gen(opr, 0, 5)end
end (* term *);
The way the expression is parsed follows the pattern of the syntax diagrams
Which for TERM is
Fact
Fact
*
/
Now the problem is that the correct order for generating reverse polish from thetree is
LHS RHS item
notLHS item RHS
This is simply solved by delaying the output of the code for the operator * or /till after parsed the RHS
M A Smith Page 28 December 16, 2009
Code generation by the PL/0 compiler
Generation of code for the operands
Consider the code for FACTOR
procedure factor(fsys: symset);var
i: integer;
begintest(facbegsys, fsys, 24);while sym in facbegsys dobegin
if sym = identthen
begini := position(id);if i = 0 then error(11)elsewith table[i] docase kind of
constant: gen(lit, 0, val);variable:
gen(lod, lev - level, adr);proc: error(21)
end ;getsym
endelse
if sym = number thenbegin
if num > amaxthen begin error(31); num := 0 end ;gen(lit, 0, num); getsym
endelse
if sym = lparen thenbegin
getsym;expression([rparen] + fsys);if sym = rparen then getsymelse error(22)
end ;test(fsys, [lparen], 23)
endend (* factor *);
M A Smith Page 29 December 16, 2009
Code generation by the PL/0 compiler
This is all done in factor, which checks that the operand is either a number,constant or variable.
Generating the correct code accordingly, note the semantic check for the tokenidentifier to check that it is a variable or constant.
Also note the recursive call to process open bracket
M A Smith Page 30 December 16, 2009
The Interpreter
The major variables used in the interpreter are:
s ARRAY [1..stacksize] of integer;The stack in which all evaluations are done
code The array of instructions to be executed
t Base of stack, used as stack pointerp The program counterb Base of the current stack frame
The code for the add and sub instructions are:
case a of
2:begin
t := t - 1;s[t] := s[t] + s[t+1];
end;
3:begin
t := t - 1;s[t] := s[t] - s[t+1];
end;
M A Smith Page 31 December 16, 2009
The Interpreter
5.6 Example of Lexical Levels
var a,b;procedure p1;var p1a,p1b;begin
a := 2; b := 3;p1a := 4; p1b := 5;
end ;begin
a := -1; b := -2; call p1;end .
0 var a,b;1 procedure p1;1 var p1a,p1b;2 begin3 a := 2; b := 3;7 p1a := 4; p1b := 5;11 end;2 int 0 53 lit 0 2 { a := 2; }4 sto 1 35 lit 0 36 sto 1 4 { b := 3; }7 lit 0 48 sto 0 3 { p1a := 4; }9 lit 0 510 sto 0 4 { p1b := 5; }11 opr 0 0 { RETURN }12 begin13 a := -1; b := -2; call p1;20 end .12 int 0 5 { Space for vars/SL/DL/RA }13 lit 0 114 opr 0 115 sto 0 3 { a := -1 }16 lit 0 217 opr 0 118 sto 0 4 { b := -2 }19 cal 0 2 { Call P1 }20 opr 0 0 { Return }
NoteThat the lexical level in the instruction is the number of lexical levelsprevious to the one in which the variable is stored.
M A Smith Page 32 December 16, 2009
The Interpreter
SL DL RA A B+---+---+---+---+---+| - | - | 0 | -1| -2|+---+---+---+---+---+
SL Static Link Base of previous lexical levelDL Dynamic Link Base of previous active stack frameRA Return Address
SL DL RA a b SL DL RA p1a p1b+---+---+---+---+---+---+---+---+---+---+| - | - | 0 | -1| -2| * | * | 20| 4 | 5 |+---+---+---+---+---+---+---+---+---+---+
^ | |+---------------------+ || |+-------------------------+
Note
How the PL/0 compiler accesses variables
Line 6 p1a := 4; LIT 4STO 0,3
Line 5 a := 2; LIT 2STO 1,3
Line 9 a := 1; LIT -1STO 0,3
M A Smith Page 33 December 16, 2009
The Interpreter
5.7 Use of a Display
Rather than use chain for static links, better to use display
+---+| * | ---> Base of global variables+---+| * | ---> Base of variables 1st lexical level+---+| * | ---> Base of variables 2nd lexical level+---+| * | ---> Base of variables 3rd lexical level+---+
etc
This would mean having to change
1) Code generation of LOD/STORE instruction
2) Interpreter to maintain Display
3) The definition of the call instruction and exit instruction(OPR 0) to maintain the display
The reason for using a display is that the access to the lexical levels by chainingthrough lexical levels at run time would be grossly inefficient for a compile.
Usually when using a display in a compiled program the display is held inregisters in this way the addressing mode base+offset can be used.
Thus if register 0 contained the current lexical level then to access the variable atdisplacement 4 in that lexical level the following addressing mode could be used
MOV 4(r0),TARGET {PDP-11/VAX instruction}
M A Smith Page 34 December 16, 2009
Recommended