Simplistic Code Generation - cs.tau.ac.il

Preview:

Citation preview

Simplistic Code GenerationMooly Sagiv

Steven Muchnick: Advanced Compiler Design and Implementationhttps://www.cis.upenn.edu/~stevez/ CS341Aho, Sethi, Ullman, Compiler Design https://en.wikipedia.org/wiki/Sethi%E2%80%93Ullman_algorithm

Outline

• Recap activation frames

• X86 principles

• Direct AST X86• The labeling algorithm for register allocation

Local/Temporary Variable Storage

• Need space to store• Global variables• Values passed as arguments to procedures• Local variables (either defined in the source program or introduced by the

• compiler)• Processors provide two options

• Registers: fast, small size (64 bits), very limited number• Memory: slow, very large amount of space (2 GB)• caching important

• In practice on X86• Registers are limited (and have restrictions)• Divide memory into regions including the stack and the heap

The C memory model

• The code & data (or "text") segment• contains compiled code, constant strings, etc.

• The Heap• Stores dynamically allocated objects

• Allocated via "malloc"

• Deallocated via "free" or garbage collection

• c runtime system

• The Stack• Stores local variables

• Stores the return address of a function

• Compiler generated code to create/delete new frames

Code

Heap

Stack

Larg

er a

dd

ress

Questions

• Why store local variables in stack frames?

• Can we store stack frames in the heap (e.g., via malloc/new)?

• What cannot be stored in a stack frame?

• Why do we use two machine registers to implement stack frames?

• What security risks do stack frames raise?

Compiling factorial

int factorial(int num) {if (num == 1) return 1 ;else return num * factorial(num -1 );

}

. factorial(int):push rbpmov rbp, rspsub rsp, 16mov DWORD PTR [rbp-4], edicmp DWORD PTR [rbp-4], 1jne .L2mov eax, 1jmp .L3

.L2:mov eax, DWORD PTR [rbp-4]sub eax, 1mov edi, eaxcall factorial(int)imul eax, DWORD PTR [rbp-4]

.L3:leaveret

Can we store activation frames in the heap?

Limitations of Stack Frames• A local variable of P cannot be stored in the activation

record of P if its duration exceeds the duration of P

• Example 1: Static variables in C(own variables in Algol)void p(int x){

static int y = 6 ;y += x;

}

• Example 2: Features of the C languageint * f() { int x ;

return &x ;}

• Example 3: Dynamic allocationint * f() { return (int *) malloc(sizeof(int)); } 8

Compiling factorial no rbp

int factorial(int num) {if (num == 1) return 1 ;else return num * factorial(num -1 );

}

. factorial(int):push rspsub rsp, 16mov DWORD PTR [rsp+4], edicmp DWORD PTR [rsp+4], 1jne .L2mov eax, 1jmp .L3

.L2:mov eax, DWORD PTR [rsp+4]sub eax, 1mov edi, eaxcall factorial(int)imul eax, DWORD PTR [rsp+4]

.L3:leaveret

Dynamic Frame Size

// crt_malloca_simple.c#include <stdio.h>#include <malloc.h> void Fn() {

char * buf = (char *)_malloca( 100 ); // do something with buf

} int main() {

Fn(); }

What are the security risks of frames?

int foo(){int a, b;int *p = &a;scanf("%d", &b);*(p+b) = 5;

}

.LC0:.string "%d"

foo:push rbpmov rbp, rspsub rsp, 16lea rax, [rbp-12]mov QWORD PTR [rbp-8], raxlea rax, [rbp-16]mov rsi, raxmov edi, OFFSET FLAT:.LC0mov eax, 0call __isoc99_scanfmov eax, DWORD PTR [rbp-16]cdqelea rdx, [0+rax*4]mov rax, QWORD PTR [rbp-8]add rax, rdxmov DWORD PTR [rax], 5nopleaveret

Buffer Overflow Exploits

void foo (char *x) {

char buf[2];

strcpy(buf, x);

}

int main (int argc, char *argv[]) {

foo(argv[1]);

}

./a.out abracadabra

Segmentation fault Stack grows this way

Memory addresses

Previous frame

Return address

Saved FP

char* x

buf[2]

ab

ra

ca

da

br

13

Buffer Overflow Exploits

14

int check_authentication(char *password) {int auth_flag = 0;char password_buffer[16];

strcpy(password_buffer, password);if(strcmp(password_buffer, "brillig") == 0) auth_flag = 1;if(strcmp(password_buffer, "outgrabe") == 0) auth_flag = 1;return auth_flag;

}int main(int argc, char *argv[]) {

if(check_authentication(argv[1])) {printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");printf(" Access Granted.\n");printf("-=-=-=-=-=-=-=-=-=-=-=-=-=-\n"); }

else printf("\nAccess Denied.\n");

}

(source: “hacking – the art of exploitation, 2nd Ed”)

Input Validation

Applicationevil input

AAAAAAAAAAAA -=-=-=-=-=-=-=-=-=-=-=-=-=-Access Granted. 65

-=-=-=-=-=-=-=-=-=-=-=-=-=-

Preventing buffer overflow exploits?

The rest of this lecture

• X86 Principles

• AST X86• The labeling algorithm for register allocation

• Intermediate Representations

X86 Assembly

• CISC

• 2- address instructions [op arg1, arg2] = arg1 op(arg1, arg2)

• Diverse data types 8-, 16-, 32-, 64-bit values + floating points, …

• Intel 64 and IA 32 architectures have a huge number of functions

• instructions range in size from 1 byte to 17 bytes

• Lots of hold-over design decisions for backwards compatibility

• Hard to understand

• The main ideas can be explained using a simple subset X86lite:• Only 64 bit signed integers (no floating point, no 16bit, no …)• 20 instructions

X86lite Registers: 16 64-bit registers

register usage

rax general purpose accumulator

rbx base register, pointer to data

rcx counter register for strings & loops

rdx data register for I/O

rsi pointer register, string source register

rdi pointer register, string destination register

rbp base pointer, points to the stack frame

rsp stack pointer, points to the top of the stack

r08-r15 General purpose registers

rip(virtual) Current machine instruction

Jumps, Call and Return

Instruction Informal formal

jmp dst Control goes to dst rip dst

call dst Control goes to dstand returns to the following instruction upon termination of dst

push riprip dst

ret Control returns to the caller

pop rip

Enter and Leave

Instruction Informal formal

enter #bytes Open a stack frame of size #bites

push ebpmov rbp, rspsub rsp, #bytes

leave Restore caller’s stack frame

move rsp, rbppop rbp

Directly Translating AST to Assembly

• For simple languages, no need for intermediate representation

• Main Idea: Maintain invariants• Code emitted for a given expression computes the answer into rax

• Key Challenges:• storing intermediate values needed to compute complex expressions

• some instructions use specific registers (e.g. shift)

Calling Conventions• Specify the locations (e.g. register or stack) of arguments passed to a function

and returned by the function

• Designate registers either• Caller Save – e.g. freely usable by the called code• Callee Save – e.g. must be restored by the called code

• Define the protocol for deallocating stack-allocated arguments

• Caller cleans up

• Callee cleans up (makes variable arguments harder)

int64_t g(int64_t a, int64_t b) {return a + b;}int64_t f(int64_t x) {int64_t ans = g(3,4) + x;return ans;}

callee

caller

x64 Calling Conventions: Caller Protocol

Callee Prolog

Callee Prolog

Callee Invariant: function argument

Callee Invariant: calee saved registers

Callee epilogue

Callee epilogue

Callee epilogue

Callee epilogue

Caller-Save and Callee-Save Registers

• callee-save-registers (MIPS 16-23, X86 r12-15, rbp, rsp)• Saved by the callee when modified

• Values are automatically preserved across calls

• caller-save-registers• Saved by the caller when needed

• Values are not automatically preserved

• Usually the architecture defines caller-save and callee-save registers• Separate compilation

• Interoperability between code produced by different compilers/languages

• But compilers can decide when to use calller/callee registers37

Caller-Save vs. Callee-Save Registers

int foo(int a) {

int b=a+1;

f1();

g1(b);

return(b+2);

}

void bar (int y) {

int x=y+1;

f2(y);

g2(2);

}

38

Syntax Directed Code Generation (Expressions)• Generate code for arguments in a designated register and store in

stack

• Generate code for expressions using stack operations

Naïve Code Generation: Expressiongenerate Code(Node: expression) {switch node: {

case number(n: integer) {emit(load eax, $n)}

case localVariable(v: symbol) {let o: integer = offestFrame(v)emit(load eax, DWORD PTR [rbp-$o])}

case e1: Node + e2: Node {generate Code(e1) // Generate code for lhs into eaxemit(push eax) // Store lhs into the stack generate Code(e2) // Generate code for rhs into eaxemit(move edx, eax) // rhs into eaxemit(pop eax) // lhs into eaxemit(add eax, edx)}

Abstract Syntax for Arithmetic Expressions

Exp id (IdExp)

Exp num (NumExp)

Exp Exp Binop Exp (BinExp)

Binop + (Plus)

Binop - (Minus)

Binop *

Binop /

(Times)

(Div)

ExpUnop Exp (UnExp)

Unop - (UnMin)41

package Absyn;

abstract public class Absyn { public int pos ;}

Exp extends Absyn {} ;

class IdExp extends Exp { String rep ;

IdExp(r) { rep = r ;}

}

class NumExp extends Exp { int number ;

NumExp(int n) { number = n ;}

}

class OpExp {

public final static int PLUS=1; public final static int Minus=2;

public final static int Times=3; public final static int Div=4;

}

final static int OpExp.PLUS, OpExp.Minus, OpExp.Times, OpExp.Div;

class BinExp extends Exp {

Exp left, right; OpExp op ;

BinExp(Exp l, OpExp o, Bin Exp r) {

left = l ; op = o; right = r ;

}

}

42

Java Code For Expressionsstatic void codeGen(Exp e) {

if isinstance of IdExp e {printf(“mov eax, DWORD PTR [rbp-%d]\n”, offset( ((IdExp) e).represent) ;}

else if isinstance of NumExp e { printf(“mov eax, %d\n”, (NumExp) e).number) ;}

else if isinstance of BinExp e {BinopEexp eb = (BinExp) e;codeGen(eb.left) ; // Generate code computing left into eaxprintf(“push eax\n”) ; // Push eax into the stack. codeGen(eb.right) ; // Generate code computing lhs into eaxprintf(“move edx, eax\n”) ; // rhs into edxprintf(“pop eax\n”) ; // rhs into eaxswitch(eb.op {

case PLUS: printf(“add eax, edx\n”) ;break…

}…

Example Compilation

+

+5

7x

move eax, 5

push eax

move eax, DWORD PTR [rbp-16]

push eax

move eax, 7

move edx,eax pop eax add eax, edx

move edx,eax pop eax add eax, edx

move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx

Code/Data

Stack

rbp

rsp

777777777

rbp-1680

Executing the generated code

move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx

Code/Data

Stack

rbp

rsp

777777777

rbp-1680

Executing the generated code

5

eax

move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx

Code/Data

Stack

rbp

rsp

777777777

rbp-1680

Executing the generated code

5

eax

5

move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx

Code/Data

Stack

rbp

rsp

777777777

rbp-1680

Executing the generated code

80

eax

5

move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx

Code/Data

Stack

rbp

rsp

777777777

rbp-1680

Executing the generated code

80

eax

5

80

move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx

Code/Data

Stack

rbp

rsp

777777777

rbp-1680

Executing the generated code

7

eax

5

80

move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx

Code/Data

Stack

rbp

rsp

777777777

rbp-1680

Executing the generated code

7

eax

5

80

7

edx

move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx

Code/Data

Stack

rbp

rsp

777777777

rbp-1680

Executing the generated code

80

eax

5

80

7

edx

move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx

Code/Data

Stack

rbp

rsp

777777777

rbp-1680

Executing the generated code

87

eax

5

80

7

edx

move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx

Code/Data

Stack

rbp

rsp

777777777

rbp-1680

Executing the generated code

87

eax

5

80

87

edx

move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx

Code/Data

Stack

rbp

rsp

777777777

rbp-1680

Executing the generated code

5

eax

5

80

87

edx

move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx

Code/Data

Stack

rbp

rsp

777777777

rbp-1680

Executing the generated code

92

eax

5

80

87

edx

Can we generate a more efficient code?

• How to better utilize machine registers

• Expression order does not matter• What is the result of “x = 1 ; ++x + (x +1)”?

• The code of the right-subtree can appear before the code of the left-subtree

• Often leads to faster code with fewer registers and load/store

• Dynamic programming can be used to compute “optimal” solution

Two Phase SolutionDynamic ProgrammingSethi & Ullman• Bottom-up (labeling)

• Compute for every subtree• The minimal number of registers needed

• Weight

• Top-Down• Generate the code using labeling by preferring “heavier” subtrees (larger

labeling)

• Can integrate spilling

The Labeling Principle

+

m registers n registers

m > n

m registers

The Labeling Principle

+

m registers n registers

m < n

n registers

The Labeling Principle

+

m registers n registers

m = n

m+1 registers

The Labeling Algorithm

weight(Node: expression): integer {switch node: {

case number(n: integer): return 1;case localVariable(v: symbol) return 1;case e1: Node + e2: Node {

let lw: integer = weight(e1);let rw: integer = weight(e2);if (lw < rw) return rw ;else if (lw > rw) return lw;else return lw + 1 ;

}…}

Labeling the example (weight)

-

*

*

b b 4 *

a c

1

2

1 1

1 1

2

2

3

Top-Down

-3

*2*2

b1 b1 41 *2

a1 c1

move R1,b move R2,b

mult R1, R2

move R2, 4

move R3, a move R2, c

mult R3, R2

mult R2, R3

sub R2, R1

T=R1

T=R1

T=R1

T=R2

T=R2

T=R2

T=R3

T=R3 T=R2

64

Generalizations

• More than two arguments for operators• Function calls

• Register/memory operations

• Multiple effected registers

• Spilling • Need more registers than available

Register Memory Operations

• add R1, X

• mult R1, X

• No need for registers to store right operands

Labeling the example (weight)

-

*

*

b b 4 *

a c

1

1

0 1

1 0

1

2

2

Top-Down

-2

*1 *2

b1 b0 41 *1

a1 c0

move R1, b

mult R1, b

move R2, 4

move R1, a

mult R1, c

Mult R2, R1

subt R1, R2

T=R1

T=R1T=R2

T=R2

T=R2

T=R1

Empirical Results

• Experience shows that for handwritten programs 5 registers suffice (Yuval 1977)

• But program generators may produce arbitrary complex expressions

Spilling

• Even an optimal register allocator can require more registers than available

• Need to generate code for every correct program

• The compiler can save temporary results• Spill registers into temporaries

• Load when needed

• Many heuristics exist

Simple Spilling Method

• Heavy tree – Needs more registers than available

• A `heavy’ tree contains a `heavy’ subtree whose dependents are ‘light’

• Generate code for the light tree

• Spill the content into memory and replace subtree by temporary

• Generate code for the resultant tree

Summary (Register allocation)

• Register allocation of expressions is simple

• Good in practice

• Optimal under certain conditions• Uniform instruction cost• `Symbolic’ trees

• Can handle non-uniform cost• Code-Generator Generators exist (BURS)

• Even simpler for 3-address machines

• Simple ways to determine best orders

• But misses opportunities to share registers between different expressions• Can employ certain conventions

• Better solutions exist• Graph coloring

Why do something else?

• The resulting code quality is poor

• Richer source language features are hard to encode• Structured data types, objects, first-class functions, …

• hard to optimize the resulting assembly code

• The representation is too concrete – e.g. it has committed to using certain registers and the stack• Only a fixed number of registers• Some instructions have restrictions on where the operands are located

• Control-flow is not structured:• Arbitrary jumps from one code block to another• Implicit fall-through makes sequences of code non-modular(i.e. you can’t rearrange sequences of

code easily)

• Retargeting the compiler to a new architecture is hard.

• Target assembly code is hard-wired into the translation

Lecture Summary

• Simple X86 code generation from AST is conceptually easy

• But poor generated code

• No global optimizations

• No modularity• Hard to retarget to different machines

• Hard to reuse for different source languages

• Hard to maintain