Upload
others
View
16
Download
0
Embed Size (px)
Citation preview
Simplistic Code GenerationMooly Sagiv
Steven Muchnick: Advanced Compiler Design and Implementationhttps://www.cis.upenn.edu/~stevez/ CS341Aho, Sethi, Ullman, Compiler Design https://en.wikipedia.org/wiki/Sethi%E2%80%93Ullman_algorithm
Outline
• Recap activation frames
• X86 principles
• Direct AST X86• The labeling algorithm for register allocation
Local/Temporary Variable Storage
• Need space to store• Global variables• Values passed as arguments to procedures• Local variables (either defined in the source program or introduced by the
• compiler)• Processors provide two options
• Registers: fast, small size (64 bits), very limited number• Memory: slow, very large amount of space (2 GB)• caching important
• In practice on X86• Registers are limited (and have restrictions)• Divide memory into regions including the stack and the heap
The C memory model
• The code & data (or "text") segment• contains compiled code, constant strings, etc.
• The Heap• Stores dynamically allocated objects
• Allocated via "malloc"
• Deallocated via "free" or garbage collection
• c runtime system
• The Stack• Stores local variables
• Stores the return address of a function
• Compiler generated code to create/delete new frames
Code
Heap
Stack
Larg
er a
dd
ress
Questions
• Why store local variables in stack frames?
• Can we store stack frames in the heap (e.g., via malloc/new)?
• What cannot be stored in a stack frame?
• Why do we use two machine registers to implement stack frames?
• What security risks do stack frames raise?
Compiling factorial
int factorial(int num) {if (num == 1) return 1 ;else return num * factorial(num -1 );
}
. factorial(int):push rbpmov rbp, rspsub rsp, 16mov DWORD PTR [rbp-4], edicmp DWORD PTR [rbp-4], 1jne .L2mov eax, 1jmp .L3
.L2:mov eax, DWORD PTR [rbp-4]sub eax, 1mov edi, eaxcall factorial(int)imul eax, DWORD PTR [rbp-4]
.L3:leaveret
Can we store activation frames in the heap?
Limitations of Stack Frames• A local variable of P cannot be stored in the activation
record of P if its duration exceeds the duration of P
• Example 1: Static variables in C(own variables in Algol)void p(int x){
static int y = 6 ;y += x;
}
• Example 2: Features of the C languageint * f() { int x ;
return &x ;}
• Example 3: Dynamic allocationint * f() { return (int *) malloc(sizeof(int)); } 8
Compiling factorial no rbp
int factorial(int num) {if (num == 1) return 1 ;else return num * factorial(num -1 );
}
. factorial(int):push rspsub rsp, 16mov DWORD PTR [rsp+4], edicmp DWORD PTR [rsp+4], 1jne .L2mov eax, 1jmp .L3
.L2:mov eax, DWORD PTR [rsp+4]sub eax, 1mov edi, eaxcall factorial(int)imul eax, DWORD PTR [rsp+4]
.L3:leaveret
Dynamic Frame Size
// crt_malloca_simple.c#include <stdio.h>#include <malloc.h> void Fn() {
char * buf = (char *)_malloca( 100 ); // do something with buf
} int main() {
Fn(); }
What are the security risks of frames?
int foo(){int a, b;int *p = &a;scanf("%d", &b);*(p+b) = 5;
}
.LC0:.string "%d"
foo:push rbpmov rbp, rspsub rsp, 16lea rax, [rbp-12]mov QWORD PTR [rbp-8], raxlea rax, [rbp-16]mov rsi, raxmov edi, OFFSET FLAT:.LC0mov eax, 0call __isoc99_scanfmov eax, DWORD PTR [rbp-16]cdqelea rdx, [0+rax*4]mov rax, QWORD PTR [rbp-8]add rax, rdxmov DWORD PTR [rax], 5nopleaveret
Buffer Overflow Exploits
void foo (char *x) {
char buf[2];
strcpy(buf, x);
}
int main (int argc, char *argv[]) {
foo(argv[1]);
}
./a.out abracadabra
Segmentation fault Stack grows this way
Memory addresses
Previous frame
Return address
Saved FP
char* x
buf[2]
…
ab
ra
ca
da
br
13
Buffer Overflow Exploits
14
int check_authentication(char *password) {int auth_flag = 0;char password_buffer[16];
strcpy(password_buffer, password);if(strcmp(password_buffer, "brillig") == 0) auth_flag = 1;if(strcmp(password_buffer, "outgrabe") == 0) auth_flag = 1;return auth_flag;
}int main(int argc, char *argv[]) {
if(check_authentication(argv[1])) {printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");printf(" Access Granted.\n");printf("-=-=-=-=-=-=-=-=-=-=-=-=-=-\n"); }
else printf("\nAccess Denied.\n");
}
(source: “hacking – the art of exploitation, 2nd Ed”)
Input Validation
Applicationevil input
AAAAAAAAAAAA -=-=-=-=-=-=-=-=-=-=-=-=-=-Access Granted. 65
-=-=-=-=-=-=-=-=-=-=-=-=-=-
Preventing buffer overflow exploits?
The rest of this lecture
• X86 Principles
• AST X86• The labeling algorithm for register allocation
• Intermediate Representations
X86 Assembly
• CISC
• 2- address instructions [op arg1, arg2] = arg1 op(arg1, arg2)
• Diverse data types 8-, 16-, 32-, 64-bit values + floating points, …
• Intel 64 and IA 32 architectures have a huge number of functions
• instructions range in size from 1 byte to 17 bytes
• Lots of hold-over design decisions for backwards compatibility
• Hard to understand
• The main ideas can be explained using a simple subset X86lite:• Only 64 bit signed integers (no floating point, no 16bit, no …)• 20 instructions
X86lite Registers: 16 64-bit registers
register usage
rax general purpose accumulator
rbx base register, pointer to data
rcx counter register for strings & loops
rdx data register for I/O
rsi pointer register, string source register
rdi pointer register, string destination register
rbp base pointer, points to the stack frame
rsp stack pointer, points to the top of the stack
r08-r15 General purpose registers
rip(virtual) Current machine instruction
Jumps, Call and Return
Instruction Informal formal
jmp dst Control goes to dst rip dst
call dst Control goes to dstand returns to the following instruction upon termination of dst
push riprip dst
ret Control returns to the caller
pop rip
Enter and Leave
Instruction Informal formal
enter #bytes Open a stack frame of size #bites
push ebpmov rbp, rspsub rsp, #bytes
leave Restore caller’s stack frame
move rsp, rbppop rbp
Directly Translating AST to Assembly
• For simple languages, no need for intermediate representation
• Main Idea: Maintain invariants• Code emitted for a given expression computes the answer into rax
• Key Challenges:• storing intermediate values needed to compute complex expressions
• some instructions use specific registers (e.g. shift)
Calling Conventions• Specify the locations (e.g. register or stack) of arguments passed to a function
and returned by the function
• Designate registers either• Caller Save – e.g. freely usable by the called code• Callee Save – e.g. must be restored by the called code
• Define the protocol for deallocating stack-allocated arguments
• Caller cleans up
• Callee cleans up (makes variable arguments harder)
int64_t g(int64_t a, int64_t b) {return a + b;}int64_t f(int64_t x) {int64_t ans = g(3,4) + x;return ans;}
callee
caller
x64 Calling Conventions: Caller Protocol
Callee Prolog
Callee Prolog
Callee Invariant: function argument
Callee Invariant: calee saved registers
Callee epilogue
Callee epilogue
Callee epilogue
Callee epilogue
Caller-Save and Callee-Save Registers
• callee-save-registers (MIPS 16-23, X86 r12-15, rbp, rsp)• Saved by the callee when modified
• Values are automatically preserved across calls
• caller-save-registers• Saved by the caller when needed
• Values are not automatically preserved
• Usually the architecture defines caller-save and callee-save registers• Separate compilation
• Interoperability between code produced by different compilers/languages
• But compilers can decide when to use calller/callee registers37
Caller-Save vs. Callee-Save Registers
int foo(int a) {
int b=a+1;
f1();
g1(b);
return(b+2);
}
void bar (int y) {
int x=y+1;
f2(y);
g2(2);
}
38
Syntax Directed Code Generation (Expressions)• Generate code for arguments in a designated register and store in
stack
• Generate code for expressions using stack operations
Naïve Code Generation: Expressiongenerate Code(Node: expression) {switch node: {
case number(n: integer) {emit(load eax, $n)}
case localVariable(v: symbol) {let o: integer = offestFrame(v)emit(load eax, DWORD PTR [rbp-$o])}
case e1: Node + e2: Node {generate Code(e1) // Generate code for lhs into eaxemit(push eax) // Store lhs into the stack generate Code(e2) // Generate code for rhs into eaxemit(move edx, eax) // rhs into eaxemit(pop eax) // lhs into eaxemit(add eax, edx)}
…
Abstract Syntax for Arithmetic Expressions
Exp id (IdExp)
Exp num (NumExp)
Exp Exp Binop Exp (BinExp)
Binop + (Plus)
Binop - (Minus)
Binop *
Binop /
(Times)
(Div)
ExpUnop Exp (UnExp)
Unop - (UnMin)41
package Absyn;
abstract public class Absyn { public int pos ;}
Exp extends Absyn {} ;
class IdExp extends Exp { String rep ;
IdExp(r) { rep = r ;}
}
class NumExp extends Exp { int number ;
NumExp(int n) { number = n ;}
}
class OpExp {
public final static int PLUS=1; public final static int Minus=2;
public final static int Times=3; public final static int Div=4;
}
final static int OpExp.PLUS, OpExp.Minus, OpExp.Times, OpExp.Div;
class BinExp extends Exp {
Exp left, right; OpExp op ;
BinExp(Exp l, OpExp o, Bin Exp r) {
left = l ; op = o; right = r ;
}
}
42
Java Code For Expressionsstatic void codeGen(Exp e) {
if isinstance of IdExp e {printf(“mov eax, DWORD PTR [rbp-%d]\n”, offset( ((IdExp) e).represent) ;}
else if isinstance of NumExp e { printf(“mov eax, %d\n”, (NumExp) e).number) ;}
else if isinstance of BinExp e {BinopEexp eb = (BinExp) e;codeGen(eb.left) ; // Generate code computing left into eaxprintf(“push eax\n”) ; // Push eax into the stack. codeGen(eb.right) ; // Generate code computing lhs into eaxprintf(“move edx, eax\n”) ; // rhs into edxprintf(“pop eax\n”) ; // rhs into eaxswitch(eb.op {
case PLUS: printf(“add eax, edx\n”) ;break…
}…
Example Compilation
+
+5
7x
move eax, 5
push eax
move eax, DWORD PTR [rbp-16]
push eax
move eax, 7
move edx,eax pop eax add eax, edx
move edx,eax pop eax add eax, edx
move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx
Code/Data
Stack
rbp
rsp
777777777
rbp-1680
Executing the generated code
move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx
Code/Data
Stack
rbp
rsp
777777777
rbp-1680
Executing the generated code
5
eax
move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx
Code/Data
Stack
rbp
rsp
777777777
rbp-1680
Executing the generated code
5
eax
5
move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx
Code/Data
Stack
rbp
rsp
777777777
rbp-1680
Executing the generated code
80
eax
5
move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx
Code/Data
Stack
rbp
rsp
777777777
rbp-1680
Executing the generated code
80
eax
5
80
move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx
Code/Data
Stack
rbp
rsp
777777777
rbp-1680
Executing the generated code
7
eax
5
80
move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx
Code/Data
Stack
rbp
rsp
777777777
rbp-1680
Executing the generated code
7
eax
5
80
7
edx
move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx
Code/Data
Stack
rbp
rsp
777777777
rbp-1680
Executing the generated code
80
eax
5
80
7
edx
move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx
Code/Data
Stack
rbp
rsp
777777777
rbp-1680
Executing the generated code
87
eax
5
80
7
edx
move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx
Code/Data
Stack
rbp
rsp
777777777
rbp-1680
Executing the generated code
87
eax
5
80
87
edx
move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx
Code/Data
Stack
rbp
rsp
777777777
rbp-1680
Executing the generated code
5
eax
5
80
87
edx
move eax, 5push eaxmove eax, DWORD PTR [rbp-16]push eaxmove eax, 7move edx,eaxpop eaxadd eax, edxmove edx,eaxpop eaxadd eax, edx
Code/Data
Stack
rbp
rsp
777777777
rbp-1680
Executing the generated code
92
eax
5
80
87
edx
Can we generate a more efficient code?
• How to better utilize machine registers
• Expression order does not matter• What is the result of “x = 1 ; ++x + (x +1)”?
• The code of the right-subtree can appear before the code of the left-subtree
• Often leads to faster code with fewer registers and load/store
• Dynamic programming can be used to compute “optimal” solution
Two Phase SolutionDynamic ProgrammingSethi & Ullman• Bottom-up (labeling)
• Compute for every subtree• The minimal number of registers needed
• Weight
• Top-Down• Generate the code using labeling by preferring “heavier” subtrees (larger
labeling)
• Can integrate spilling
The Labeling Principle
+
m registers n registers
m > n
m registers
The Labeling Principle
+
m registers n registers
m < n
n registers
The Labeling Principle
+
m registers n registers
m = n
m+1 registers
The Labeling Algorithm
weight(Node: expression): integer {switch node: {
case number(n: integer): return 1;case localVariable(v: symbol) return 1;case e1: Node + e2: Node {
let lw: integer = weight(e1);let rw: integer = weight(e2);if (lw < rw) return rw ;else if (lw > rw) return lw;else return lw + 1 ;
}…}
Labeling the example (weight)
-
*
*
b b 4 *
a c
1
2
1 1
1 1
2
2
3
Top-Down
-3
*2*2
b1 b1 41 *2
a1 c1
move R1,b move R2,b
mult R1, R2
move R2, 4
move R3, a move R2, c
mult R3, R2
mult R2, R3
sub R2, R1
T=R1
T=R1
T=R1
T=R2
T=R2
T=R2
T=R3
T=R3 T=R2
64
Generalizations
• More than two arguments for operators• Function calls
• Register/memory operations
• Multiple effected registers
• Spilling • Need more registers than available
Register Memory Operations
• add R1, X
• mult R1, X
• No need for registers to store right operands
Labeling the example (weight)
-
*
*
b b 4 *
a c
1
1
0 1
1 0
1
2
2
Top-Down
-2
*1 *2
b1 b0 41 *1
a1 c0
move R1, b
mult R1, b
move R2, 4
move R1, a
mult R1, c
Mult R2, R1
subt R1, R2
T=R1
T=R1T=R2
T=R2
T=R2
T=R1
Empirical Results
• Experience shows that for handwritten programs 5 registers suffice (Yuval 1977)
• But program generators may produce arbitrary complex expressions
Spilling
• Even an optimal register allocator can require more registers than available
• Need to generate code for every correct program
• The compiler can save temporary results• Spill registers into temporaries
• Load when needed
• Many heuristics exist
Simple Spilling Method
• Heavy tree – Needs more registers than available
• A `heavy’ tree contains a `heavy’ subtree whose dependents are ‘light’
• Generate code for the light tree
• Spill the content into memory and replace subtree by temporary
• Generate code for the resultant tree
Summary (Register allocation)
• Register allocation of expressions is simple
• Good in practice
• Optimal under certain conditions• Uniform instruction cost• `Symbolic’ trees
• Can handle non-uniform cost• Code-Generator Generators exist (BURS)
• Even simpler for 3-address machines
• Simple ways to determine best orders
• But misses opportunities to share registers between different expressions• Can employ certain conventions
• Better solutions exist• Graph coloring
Why do something else?
• The resulting code quality is poor
• Richer source language features are hard to encode• Structured data types, objects, first-class functions, …
• hard to optimize the resulting assembly code
• The representation is too concrete – e.g. it has committed to using certain registers and the stack• Only a fixed number of registers• Some instructions have restrictions on where the operands are located
• Control-flow is not structured:• Arbitrary jumps from one code block to another• Implicit fall-through makes sequences of code non-modular(i.e. you can’t rearrange sequences of
code easily)
• Retargeting the compiler to a new architecture is hard.
• Target assembly code is hard-wired into the translation
Lecture Summary
• Simple X86 code generation from AST is conceptually easy
• But poor generated code
• No global optimizations
• No modularity• Hard to retarget to different machines
• Hard to reuse for different source languages
• Hard to maintain