Memory Layout Compiler Baojian Hua bjhua@ustc.edu.cn

Preview:

Citation preview

Memory Layout

CompilerBaojian Hua

bjhua@ustc.edu.cn

Middle and Back End

AST translation IR1

asmmore IRs and translatio

n

translation IR2

Sources and IRsCODE DATA

Procedures

Control Flow

Statements

Data Access

Global Static Variables

Global Dynamic Data

Local Variables

Temporaries

Parameter Passing

Read-only Data

A code generator should… Translate all “CODE” to machine (or

assembly) instructions target-dependent

Allocate space for variables, etc. (“DATA”)

Respect the calling conventions and other constraints

To do all these, must know details of modern processors! and the impact on code generation

Overview of a modern processor

ALU Control Memory Registers

Memory

Registers ALU

Control

Arithmetic and Logic Unit

Most arithmetic and logic operation addl %eax, %ebx incl 4(%ecx)

Operands: immediate register memory

Memory

Registers ALU

Control

Arithmetic and Logic Unit

Operations may have constraints how to perform a division?

cltd; idivl ... Operations may raise exception

s idivl 0

Operations on different types addb, addw, addl, addq

Memory

Registers ALU

Control

Control

Executing instructions instructions are in memory

(pointed by PC)

for (;;) instruction = *PC; PC++; execute (instruction);

Memory

Registers ALU

Control

Registers

Limited but high-speed 8 on x86, more on RISC

Most are general-purpose but some are of special

use

Memory

Registers ALU

Control

Memory

Address space is the way how programs use memory highly architecture

and OS dependent right is the typical

layout of 32-bit x86/Linux

OS

heap

data

text

BIOS,VGA

0x00100000

stack

0xc00000000

0x08048000

0x00000000

0xffffffff

Read Only Data

Procedures

Control Flow

Statements

Data Access

Global Static Variables

Global Dynamic Data

Local Variables

Temporaries

Parameter Passing

Read-only Data

OS

heap

data

text

BIOS,VGA

stack .text

f:

pushl $s

call printf

s:

.string “hello”

char *s=“hello”;

void f ()

{printf(s);}

Global Static Variables

Procedures

Control Flow

Statements

Data Access

Global Static Variables

Global Dynamic Data

Local Variables

Temporaries

Parameter Passing

Read-only Data

OS

heap

data

text

BIOS,VGA

stack .text

f:

movl d, %eax

incl %eax

movl %eax, d

.data

d:

.int 1

int d = 1;

void f (){

d++;

}

Global Dynamic Data

Procedures

Control Flow

Statements

Data Access

Global Static Variables

Global Dynamic Data

Local Variables

Temporaries

Parameter Passing

Read-only Data

OS

heap

data

text

BIOS,VGA

stack .text

f:

pushl $4

call malloc

movl %eax, %ebx

void f (){

malloc(4);

}

Global Dynamic Data

Procedures

Control Flow

Statements

Data Access

Global Static Variables

Global Dynamic Data

Local Variables

Temporaries

Parameter Passing

Read-only Data

OS

heap

data

text

BIOS,VGA

stack .text

f:

pushl $4

call malloc

movl %eax, %ebx

void f (){

malloc(4);

}

Function, or Procedure, or method, or … High-level abstraction of code

logically-grouped Good for many things:

design and abstraction develop, testing, maintain and evolve …

Implementation? we start with C-style functions, and deal

with more advanced forms later

API & ABI Application Programming Interface

interfaces between source programs Application Binary Interface

contracts between binary programs even compiled from different languages by different

compilers conventions on low-level details:

how to pass arguments? how to return values? how to make use of registers? …

we posted the x86 ABI document on course page

Parameter Passing

Parameter passing Must answer two problems:

what to pass? call-by-value call-by-reference call-by-need …

how to pass? calling convention

http://en.wikipedia.org/wiki/X86_calling_conventions

Call-by-reference In languages such

as C++ arguments are

escaped so can not be

constants? actual arguments

and formal parameters are aliases

// C++ style reference:

int f (int &x, int y)

{

x = 3;

y = 4;

return 0;

}

// a call

f (a, b);

Simulating call-by-reference// original C++ code:

int f (int &x, int y)

{

x = 3;

y = 4;

return 0;

}

// a call

f (a, b);

// simulated:

int f (int *x, int y)

{

*x = 3;

y = 4;

return 0;

}

// the call becomes:

f (&a, b);

Moral

Call-by-reference is widely considered a wrong design of C++ the code is inherently inefficient! the code is ambiguous in nature

x = 4; (?)

A variant of this is the so-called call-by-value/result looks like call-by-value, but with effect

Call-by-value/result Upon call, the actual ar

guments is copies But callee only modifie

s a local version Upon exit, callee copie

s the local version to actual arguments

and formal parameters are aliases

// code:

int f (int @x, int y)

{

x = 3;

y = 4;

return 0;

}

// a call

f (a, b);

Simulating call-by-value/result// original code:

int f (int @x, int y)

{

x = 3;

y = 4;

return 0;

}

// a call

f (a, b);

// simulated:

int f (int *x, int y)

{

int temp = *x;

temp = 3;

y = 4;

*x = temp;

return 0;

}

// the call becomes:

f (&a, b);

Moral

What’s the difference between call-by-value and call-by-value-result?

Is call-by-value/result more efficient than call-by-reference? Why or why not?

We’d come back to a more interesting optimization called register promotion same idea to pull value into registers

Call-by-name Some languages, su

ch as Algo60 and Haskell, use call-by-name

Arguments are not evaluated, until they are really needed in the callee

For each argument, create a function, called a thunk

// code:

int f (int name x, int y)

{

if (y)

return x;

else

return 0;

}

// a call

f (a, b);

Simulating call-by-name// original code:

int f (int name x, int y)

{

if (y)

return x;

else

return 0;

}

// a call

f (a, b);

// simulated:

int f (fX: unit -> int, int y)

{

if (y)

return fX ();

else

return 0;

}

// the call becomes:

f (fn () => a, b);

this function is not closed!

Moral

A serious problem with call-by-name, is that the arguments may be evaluated many times

A better solution is to memoize the evaluation result

This method is called call-by-need, or sometimes lazy-evaluation

Simulating call-by-need// original code:

int f (int need x, int y)

{

if (y)

return x + x;

else

return 0;

}

// a call

f (a, b);

// simulated:

int f (fX: unit -> int, int y) {

if (y)

return fX() + fX();

else return 0;

}

// the call becomes:

val xMemoize = ref NONE

f (fn () =>

case !xMemoize of

NONE => a; store

| SOME i => i, b);

Where to pass the parameters?

Different calling conventions: pass them in registers pass them on stack (typically: the call

stack) a combination of the two

parts in registers, parts on the stack

This involves not only the ISA, but also the languages

Sample Calling Conventions for C on x86 (from Wiki)

Registers

Register usage Must be careful on register usage

caller-save: Callee is free to destroy these registers

eax, ecx, edx, eflags, fflags [and also all FP registers]

callee-save: Callee must restore these registers before returning to caller

ebp, esp, ebx, esi, edi [and also FP register stack top]

Register usage Should value reside in caller-save or callee-

save registers? not so easy to determine and no general rules must be veryyyyyyyyy careful with language feat

ures such as longjmp, goto or exceptions we’d come back to this later

We’d also come back to this issue later in register allocation part

The Call Stack

Stack on x86

Two dedicated regs Stack grows down to

lower address Frame also called ac

tivation record

frame 0

high address

%ebp

frame 1

frame 2

%esp low address

Stack Frameint f (int arg0, int arg1, …)

{

int local1;

int local2;

…;

}

%ebp

%esp

arg1

arg0

ret addrold ebplocal1

local2

Procedures

Control Flow

Statements

Data Access

Global Static Variables

Global Dynamic Data

Local Variables

Temporaries

Parameter Passing

Read-only Data

Put these together// C code

int main(void)

{ return f(8)+1; }

int f(int x)

{ return g(x); }

int g(int x)

{ return x+3; }

// x86 code

main:

pushl %ebp

movl %esp, %ebp

pushl $8

call f

incl %eax

leave

ret

Put these together// C code

int main(void)

{ return f(8)+1; }

int f(int x)

{ return g(x); }

int g(int x)

{ return x+3; }

// x86 code

f:

pushl %ebp

movl %esp, %ebp

pushl 8(%ebp)

call g

leave

ret

Put these together// C code

int main(void)

{ return f(8)+1; }

int f(int x)

{ return g(x); }

int g(int x)

{ return x+3; }

// x86 code

g:

pushl %ebp

movl %esp, %ebp

movl 8(%ebp), %eax

addl $3, %eax

leave

ret

Implementation

Design a frame (activation record) data structure the frame size garbage collection info detailed layout, etc.

Thus, hide the machine-related details good for retargeting the compiler

Interfacesignature FRAME =

sig

type t

(* allocate space for a variable in frame *)

val allocVar: unit -> unit

(* create a new frame *)

val new: unit -> t

(* current size of the frame *)

val size: unit -> int

end

Frame on stack

Both function arguments and locals have a FIFO lifetime as with functions so one can put stack frame on the call

stack But later, we have the chance to

see other possibilities e.g.: higher-order nested functions

Nested Function

Nested Functions Functions declared

in the body of another function So the inner one

could refer to the variables in the outer ones

such kind of functions are called open

int f (int x, int y)

{

int m;

int g (int z)

{

int h ()

{

return m+z;

}

return 1;

}

return 0;

}

Nested Functions How to access

those variables in outer functions?

Three classical methods: lambda lifting static link display

int f (int x, int y)

{

int m;

int g (int z)

{

int h ()

{

return m+z;

}

return 1;

}

return 0;

}

Lambda lifting

In lambda lifting, the program is translated into a form such that all procedures are closed

The translation process starts with the inner-most procedures and works its way outwards

Lambda lifting exampleint f (int x, int y)

{

int m;

int g (int z)

{

int h (int &m, &z)

{

return m+z;

}

return 1;

}

return 0;

}

int f (int x, int y)

{

int m;

int g (int z)

{

int h ()

{

return m+z;

}

return 1;

}

return 0;

}

Lambda lifting exampleint f (int x, int y)

{

int m;

int g (int &m, int z)

{

int h (int &m, &z)

{

return m+z;

}

return 1;

}

return 0;

}

int f (int x, int y)

{

int m;

int g (int z)

{

int h ()

{

return m+z;

}

return 1;

}

return 0;

}

Lambda lifting example// flatten

int f (int x, int y){

int m;

return 0;

}

int g (int &m, int z){

return 1;

}

int h (int &m, &z){

return m+z;

}

int f (int x, int y)

{

int m;

int g (int z)

{

int h ()

{

return m+z;

}

return 1;

}

return 0;

}

Moral Pros:

easy to implement, source-to-source translations

even before code generation Cons:

all variables are escaped extra arguments passing

on some architectures, more arguments are passed in memory, so it’s inefficient

Static links An alternative approach is to add

an additional piece of information to the activation records, called the static link

The static link is a pointer to the activation record of the enclosing procedure

Used in the Borland Turbo Pascal compiler

Static links exampleint f (link,int x, int y)

{

int m;

int g (link, int z){

int h (link){

return link->

prev->m+

link->z;

}

return 1;

}

return 0;

}

int f (int x, int y)

{

int m;

int g (int z)

{

int h ()

{

return m+z;

}

return 1;

}

return 0;

}

Pros and cons

Pros: Little extra overhead on parameter

passing the static link

Cons: Still there is the overhead to climb up

a static link chain to access non-locals

Implementation details

First, each function is annotated with its enclosing depth, hence its variables

When a function at depth n accesses a variable at depth m emit code to climb up n-m links to visit th

e appropriate activation record

Implementation details When a procedure p at depth n calls a

procedure q at depth m: if n<m (ie, q is nested within p):

note: in first-order languages, n=m-1 q’s static link = q’s dynamic link

if nm: q’s prelude must follow m-n static links, sta

rting from the caller’s (p’s) static link the result is the static link for q

Moral In theory, static links don’t seem very good

functions may be deeply nested However, real programs access mainly

local/global variables, or occasionally variables just one or several static links away

Still, experimentation shows that static links are inferior to the lambda-lifting approach Personally, I believe static links are infeasible to

optimizations

Display The 3rd way to handle nest functions is

to use a display A display is a small stack of pointers to

activation records The display keeps track of the lexical

nesting structure of the program Essentially, it points to the currently set

of activation records that contain accessible variables

Higher-order functions Functions may serve more than just

being called can be passed as arguments can return as results can be stored in data structures

objects! we’d discuss later If functions don’t nest, then the

implementation is simple a simple code address e.g., the “function pointer” in C

Higher-order functions

But if functions do nest, it’s much trickier to compile: as found in Lisp, ML, Scheme even in recent version of C# and Java

Later, we’d discuss more advanced techniques to handle this

Recommended