40
1 Intermediate code generation Semantical analysis Syntactical analysis Lexical analysis Optimization Code generation Target specific optimization

Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

Embed Size (px)

Citation preview

Page 1: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

1

Intermediate code generation

Semantical analysis

Syntactical analysis

Lexical analysis

Optimization

Code generation

Target specific optimization

Page 2: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

2

Runtime environments - Chapter 7

We need to translate high level code like “trac42” into machine code.

The instruction set of trac42VM (virtual machine):● Can be seen as an intermediate representation ● At the same time possible to execute in the interpreter trac42i.

Difference from assignment 1 - we now have type checking:● Want to generate “real code” ● Need to consider types of data● Comparing two strings differs from comparing two integers● Different machine instructions are needed

Page 3: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

3

Code and dataIn the source language data and executable code are put into the same “sheet of paper”

- as declarations and statements in the source code file.

When executing a program we need to differentiate the code and data.

The main purpose of this chapter for us:● We need to understand the translation from a representation of a high level language to stack machine code● We need to understand how the stack machine code operates on the stack.

Some issues:● We cant use labels – only real addresses● Functions can be called from different contexts● Functions takes parameters and returns values.● Different data types differs in size.

Page 4: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

4

Memory arrangement

Memory is partitioned into different sections● Executable code of a program● Stack – e.g. local variables● Static – global and static memory● Heap – dynamically allocated memory

Page 5: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

5

Functions

A function definition:

void foo(){   write "Hello";}

Means: “foo” is a label for the code between the braces, the “body”

When using “foo” in some other code, the body of “foo” should be executed.

We may also need to handle recursive functions, i.e. “foo” may be referred in the body of “foo”.

Page 6: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

6

Functions

Another function definition:int twice(int x){  int y;  y = 2 * x;  return y;}

A definition like the ones above (in languages like C and trac42) also serves as a declaration of the interface to the function.

This declaration states:● twice need to be called with a single int parameter● twice returns a value of type int.

This is useful for the type analysis (as you already know).

It's also needed to be able to generate code correctly.

Page 7: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

7

Function calls

. . .   a = twice(b + 1) + 1; . . .● Ensure that the body of twice is executed● You must also handle parameters and return value!

Before the execution of the body of twice:● The assignment xtwice = b + 1 must take place.When execution has finished ●The call expression is replaced by the returned value in the expression of the call, i.e.: ●Continue the execution after the call computing:a = retvaltwice + 1

To ensure a correct resume of the callers execution, we may also need● Save the processor state● Save the return-to addressDo this before invoking the function.

Page 8: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

8

Function calls' contextsvoid bar()    ...

   a = twice(b + 1);   baz();    ...

void foo()    ...

   x = twice(x + 2) + 2;    ...

We also need to handle the function calls different in different contexts   xtwice = bbar + 1   "execute body of twice"   abar = retval + 1; ...   xtwice = xfoo + 2   "execute body of twice"   xfoo = retval + 2;We use a stack to accomplish this.

Page 9: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

9

The stack

A stack is used to convey the needed information● To called functions● From from called functions● Necessary program state (e.g. the return-to address)

The information on the stack associated to one function call is called activation record (or sometimes stack frame).

Page 10: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

10

Layout of an activation record

Actual arguments

Return address

Frame pointer

Local variables

Temporary variables

Return value

Top

Bottom

Page 11: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

11

Mapping to C

Actual arguments

Return address

Frame pointer

Local variables

Temporary variables

Return value

int twice(int x){  int y;  y = 2 * x;  return y;}

. . .a = twice(13);x = ...

Page 12: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

12

Example with numbers

Actual arguments

Return address

Frame pointer

Local variables

Return value

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(42);8  x = ...

Callersrecord

40

39

38

37

41

42

Activation record needed before

executing stmt 4!

FP

SP

Page 13: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

13

Example with numbers

Actual arguments

Return address

Frame pointer

Local variables

Return value

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(42);8  x = ...

Callersrecord

42

Global registers holding stack pointer

and frame pointer

FP

SP

40

39

38

37

41

Page 14: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

14

Mapping to stack

Callersrecord

42

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(13);8  x = ...

Start here

42

42FP

SP

Page 15: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

15

Mapping to stack

Return value

Callersrecord

?? 41

42

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(13);8  x = ...

41

42FP

SP

Page 16: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

16

Mapping to stack

Actual arguments

Return value

Callersrecord

13 40

41

42

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(13);8  x = ...

40

42FP

SP

Page 17: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

17

Mapping to stack

Actual arguments

Return address

Return value

Callersrecord

13

8 *

42

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(13);8  x = ...

39

42FP

SP

40

39

41

* does not map to high-level view

Page 18: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

18

Mapping to stack

Actual arguments

Return address

Frame pointer

Return value

Callersrecord

13

8

42

42

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(13);8  x = ...

38

38FP

SP

40

39

38

41

Page 19: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

19

Mapping to stack

Actual arguments

Return address

Frame pointer

Local variables

Return value

Callersrecord

13

8

42

??

42

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(13);8  x = ...

37

38FP

SP

40

39

38

37

41

Page 20: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

20

Mapping to stack

Actual arguments

Return address

Frame pointer

Local variables

Return value

Callersrecord

13

8

42

26

42

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(13);8  x = ...

37

38FP

SP

40

39

38

37

41

Page 21: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

21

Mapping to stack

Actual arguments

Return address

Frame pointer

Local variables

Return value

Callersrecord

13

8

42

26

26

42

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(13);8  x = ...

37

38FP

SP

40

39

38

37

41

Page 22: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

22

Mapping to stack

Actual arguments

Return address

Return value

Callersrecord

13

8

26

42

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(13);8  x = ...

39

42FP

SP

40

39

41

Page 23: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

23

Mapping to stack

Actual arguments

Return value

Callersrecord

13

26

40

41

42

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(13);8  x = ...

40

42FP

SP

Page 24: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

24

Mapping to stack

Return value

Callersrecord

26 41

42

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(13); /*­> 26 */8  x = ...

41

42FP

SP

Page 25: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

25

Responsibility

- Who does what in the administration of function invocation?

Calling function (function call)● Reserve return value space● Push actual arguments● Pushes the return-to address● Calls the function● Pops actual arguments

Called function (function definition)● Save and re-set FP● Reserve space for local variables● Assign the return value● Restore FP● Return to caller

The task for your compiler is to generate code that handles this!

Page 26: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

26

The task for the compiler

Calculate the actual addresses to:● Variables● Parameters ● Return value● Generate instructions using proper addresses

The FP is there to simplify the task for you.- You just need to calculate the offset relative to FP!

We split the work into two tasks (two passes):● Calculate offsets● Generate code using the offsets

Page 27: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

27

Actual arguments

Return address

Frame pointer

Local variables

Return value

Callersrecord

13

8

42

26

26

42

int twice(int x){3  int y;4  y = 2 * x;5  return y;}

7  a = twice(13);8  x = ...

37

38FP

SP

For the moment: assume that all stack objects takes 1 stack position.What are the offsets for y, x and the return value?

Offset calculation

40

39

38

37

41

Page 28: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

28

Code to be generated

Function callIf the function returns a value you must reserve space for the return value. Note! This needs special care in a language like C, where a function's result may be ignored.DECL #size (or PUSH)

For each actual argument, PUSH its value on the stack.

Call the function and push the return-to address at the same timeBSR #address

Pop the actual arguments.POP #size

Page 29: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

29

Code to be generated

Function definitionPush FP and set FP to a stack address that is fixed throughout the functions execution.LINK

Reserve space for all the local variables.DECL #size

Restore FP to the value it had when entering this functionUNLINK

Return to the callerRTS

Page 30: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

30

A complete example

save FPy&y2x*y=&retvalyretval=restore FP

restore FP

save FPa&aretvalargscall twicepop argsa=restore FP

int twice(int x){  int y;  y = 2 * x;  return y;}

void trac42(){  int a;  a = twice(13);}

2 [twice] 3 LINK 4 DECL #1 5 LVAL -1(FP) 6 PUSHINT #2 7 RVALINT 2(FP) 8 MULT 9 ASSINT10 LVAL 3(FP)11 RVALINT -1(FP)12 ASSINT13 UNLINK14 RTS15 UNLINK16 RTS17 [trac42]18 LINK19 DECL #120 LVAL -1(FP)21 DECL #122 PUSHINT #1323 BSR 224 POP 125 ASSINT26 UNLINK27 RTS

Page 31: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

31

A stack trace

● When starting a program the stack is empty● However, SP and FP always have some values.● The initial value of FP is uninteresting in our context● The initial value of SP is usually set to leave sufficient space for the program.● When a program has terminated the stack should be empty - the SP should have retained its initial value.● Note that the semantics of UNLINK may hide stack errors. You need to check that that SP is restored to the position of the last local variable before executing UNLINK.● You can use trac42i to do traces.● You will however need to do some practicing on paper to get how it all works ...

Page 32: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

32

News with the stack machine code

● Sizes matters, e.g. size of( INT )!= size of( STRING )● Operations that depend on the size have a type tag, e.g.

● PUSHINT, PUSHBOOL, PUSHSTR● RVALINT, ...● ASSINT, ...● EQINT, ...● LTINT, ...● LEINT, ...● READINT, ...● WRITEINT, ...

● LINK and UNLINK used to handle FP● BSR and RTS to call/return. BSR needs the actual address● Instructions referring to the stack needs the offset to FP● string is 100, everything else (including pointers) are 1● You should be prepared for any size to appear...

Page 33: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

33

Show the stack content as it is before instruction 13 ifSP = 100 FP = 102

save FPy&y2x*y=&retvalyretval=restore FP

restore FP

save FPa&aretvalargscall twicepop argsa=restore FP

int twice(int x){  int y;  y = 2 * x;  return y;}

void trac42(){  int a;  a = twice(13);}

2 [twice] 3 LINK 4 DECL #1 5 LVAL -1(FP) 6 PUSHINT #2 7 RVALINT 2(FP) 8 MULT 9 ASSINT10 LVAL 3(FP)11 RVALINT -1(FP)12 ASSINT13 UNLINK14 RTS15 UNLINK16 RTS17 [trac42]18 LINK19 DECL #120 LVAL -1(FP)21 DECL #122 PUSHINT #1323 BSR 224 POP 125 ASSINT26 UNLINK27 RTS

Exercise

Page 34: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

34

Reverse this to t42-code!(invent names of identifiers when needed)

2 [id] 3 LINK 4 RVALINT 2(FP) 5 RVALINT 3(FP) 6 ADD 7 WRITEINT 8 POP 1 9 UNLINK 10 RTS 11 [trac42] 12 LINK 13 DECL #1 14 LVAL -1(FP) 15 PUSHINT #42 16 ASSINT 17 PUSHINT #27 18 RVALINT -1(FP) 19 PUSHINT #99 20 ADD 21 BSR 2 22 POP 1 23 POP 1 24 UNLINK 25 RTS

Page 35: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

35

Encode this to trac42VM-code (stack machine code)

void print(string s1, int x, string s2, int y){   write s1;   write x;   write s2;   write y;}

void trac42 (){   print("Here comes one:", 1, "Here comes two:", 2);}

Page 36: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

36

Parameters passing

● In trac42 we use call by value – we get what we write.

● Same in C

● Another approach is call by reference – a pointer to what we write is pushed.

● E.g. Java objects and references in C++.

● Usually combined with call by value

● Other:

● Call by name – similar to macros in C

● Copy restore – like the name suggests...

Page 37: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

37

Memory allocation

● In trac42 we use stack allocation only.

● Another approach is static allocation for variables

● Disadvantage: does not allow for recursion

● Advantage: May be fast, no risk for stack overflow...

● Heap allocation

● Java objects

● Many functional languages

● Do not confuse with explicit memory allocation like in C/C++

Methods are usually mixed

● Globals and static locals in C are static

● Locals are on the stack

Page 38: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

38

Local functions

Some languages allow for nested functions (local)

main() {int x;foo(int y) {

if y > 0 foo(y­1);x = 42;

}foo(2);

}

How to find x on the stack?

How to find y on the stack, not confusing it with the y of the (recursive) caller?

Page 39: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

39

Local functions

We add a so called access link to the stack frame

This can be seen as the next pointer in a linked list, where the records in the linked list are the activation records.

Actual arguments

Return address

Frame pointer

Local variables

Return value

Access link

The “linked list model” is not exactly true, because sometimes there may e.g. be short-cuts in the “linked list”:

Page 40: Intermediate code generation - MDH Runtime environments - Chapter 7 We need to translate high level code like “trac42” into machine code. The instruction set of trac42VM (virtual

40

Local functions

Generate code that for each access of non-local variables follows the access links to the activation record of the function to which the variable belongs.

● The relative nesting depth between the function of declaration and the function of the access determines the code to generate.

Generate code that sets the access link. That code must be generated at the call site.

● The relative nesting depth between the caller and called determines the code to generate.