23
CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode compiler

CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

Embed Size (px)

Citation preview

Page 1: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS412/413

Introduction toCompilers and Translators

May 3, 1999

Lecture 34: Compiler-like Systems

JIT

byteco

de

interpre

ter

src-to-srctranslator

bytecodecompiler

Page 2: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

2

Administration

• Project code must be turned in Friday even if you haven’t finished the programming assignment -- at minimum, turn in something that can compile programs.

• If parts of program are incomplete, include explanation of design, and components that are working

Page 3: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

3

A CompilerLexical analysis

Parsing

Intermediate Code Generation

MIR Optimization

Code generation

LIR optimization

HIR Optimization

Linking & Loading

“CompilerClassic”

bytecodecompiler

JITcompiler

front end

back end

source-to-source

translator

interpreter

Page 4: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

4

Other options• Compile from source to another high-level

language (source-to-source translator), let other compiler deal with code gen, etc.

• Compile from source to an intermediate code format for which a back end already exists (ucode, RTF, LCC, ...)

• Compile from source to an intermediate code format executable by an interpreter– abstract syntax tree– bytecodes– threaded code (call- or pointer-threaded)

• Advantage: Architecture independence

Page 5: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

5

Source-to-source translator

• Idea: choose well-supported high-level language (e.g. C, C++, Java) as target

• Translate AST to high-level language constructs instead of to IR, pass translated code off to underlying compiler

• Advantage: easy, can leverage good underlying compiler technology. Examples: C++ (to C), PolyJ (to Java), Toba (JVM to C)

• Disadvantages: target language won’t support all features, optimization harder in target language, language may impose extra checks

Page 6: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

6

Compiling to C• C doesn’t impose extra checks, is

reasonably close to assembly (but can’t support static exception tables!), widely available

• Mismatch: doesn’t allow statements underneath expressions; need to translate and canonicalize in one step

• Translation of an expression into C is:– a sequence of statements to be executed– an expression to be evaluated afterward

E S1;…;Sn , E’

Page 7: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

7

Translation rules• Translation still can be performed

by recursive traversal of AST• IotaC rules:

E S , E’

id = E; S , id = E’;

Si Si’ , Ei’

{ S1;…; Sn } S1’;…; Sn’ , En’block

assignment

Page 8: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

8

Translating to Java• Same problems as C, plus: Java is

type-safe (good in a HLL, not so good in an intermediate language!)

• May need to use cast and instanceof expressions in generated code: slow in Java

Page 9: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

9

Back Ends• Several standard intermediate code

formats exist with back ends for various architectures -- can reuse back ends– p-code: very old stack machine format– UCODE: old Stanford/MIPS stack machine

format– Java bytecode: new stack machine format– RTL: GNU gcc, etc.– SUIF: Stanford format for optimization– LCC: Lightweight C compiler

Page 10: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

10

Intermediate code formats• Quadruples

– compact, similar to machine code, good for standard optimization techniques

• Stack-machine– easy to generate code for– not compact (contrary to popular opinion)– can be converted back into quadruples– used by some (sort of) high-level

languages: FORTH, PostScript, HP calculators

Page 11: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

11

Stack machine format• Code is a sequence of stack

operations (not necessarily the same stack as the call stack)

push CONST : add CONST to the top of stackpop : discard top of stackstore : in memory location specified by

top of stack store element just below.load : replace top of stack with memory

location it points to+, *, /, -, … : replace top two elems w/

result of operation

Page 12: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

12

Generating code• Stack operations mostly don’t name

operands (implicit): can code in 1 byte• Expression is translated to code that

leaves its value on the top of stack

• Translation of E1 + E2:

E1 + E2 E1; E2; +

• Translation of id = E : id = E E ; push addr(id); store• Good for very simple code generation

Page 13: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

13

Verbosity• Values get “trapped” down low on

stack especially with subexpression elim.

• Need instruction that re-pushes element at known stack index on top; instruction is used often

• Might as well have abstract register operands!

• Result: less compact than a register-based format; extra copies of data too

Page 14: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

14

Stack machine register machine

• At each point in code, keep track of stack depth (if possible)

• Assign temporaries according to depth

• Replace stack operands with quadruples using these temporariesa b c

a + b*c

push a ; 0push b ; 1push c ; 2* ; 1+ ; 0

t0 = at1 = bt2 = ct1 = t1 * t2t0 = t0 + t1

0 1 2

a b*ca+b*c

Page 15: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

15

JIT compilers• Particularly widely available back end

with well-defined intermediate code (JVM bytecode)

• Generate code by reconstructing registers as discussed

• Compilation is done on-the-fly: generating code quickly is essential generated code quality is usually low

• HotSpot: new Sun JIT. High-quality optimization (esp. inlining and specialization), but used sparingly

Page 16: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

16

Interpreters• “Why generate machine code at

all? Just run it. Processors are really fast”

• Options:– token interpreters (parsing on the

fly) -- reallly slow (>1000x)– AST interpreters -- 300x– threaded interpreters -- 20-50x– bytecode interpreters -- 10-30x

Page 17: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

17

AST interpreters“Yet another recursive traversal”• For every node type in AST, add

method Object evaluate(RunTimeContext r)• Evaluate method is implemented

recursivelyObject PlusNode.evaluate(r) {

return left.evaluate(r).plus(right.evaluate(r)); }

• Variables, etc. looked up in r; some help from AST yields big speed-ups (e.g. pre-computed variable locations)

Page 18: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

18

Threaded interpreters• Code is represented as a bunch of blocks of

memory connected by pointers. Blocks are either code blocks or pointer blocks.

• Execution is (mostly) depth-first traversal• Each block starts with a single jump

instruction

JMP nextJMP

machinecode

RET

Page 19: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

19

Threaded interpreters• Can call blocks directly regardless of

kind• Conditional branches, other control

structure embedded in special machine code blocks that munge stack

• Interpreter proper is ~10 instructions of code: very space-efficient!

• Long favored by FORTH fans: whole language runtime fits in 4K memory

Page 20: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

20

Call-threaded interpreters• Instead of pointers in pointer blocks,

store CALL instructions• Depth-first traversal done by processor

automatically• Faster, trivial code generation required

CALLCALLCALL

machinecode

RETCALL

RETCALLRET

Page 21: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

21

Bytecode interpreters• Problem with threaded interpreters:

operations with immediate arguments require multiple words (code bloat)

• Observation: most interpreter instructions can be encoded in less than a word, many even in just a byte (esp. with stack-machine model)

• Tradeoff: interpreter size vs. executable code size

Page 22: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

22

Implementing a bytecode interpreter

• Bytecode interpreter simulates a simple architecture (either stack or register machine)

• Interpreter state:– current code pointer– current simulated return stack– current registers or stack & stack pointer

• Interpreter code is a big loop containing a switch over kinds of bytecode instructions– one big function: optimization techniques work

well. Ideally: no recursion on function calls

• Result: about 10x slowdown if done right

Page 23: CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode

CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers

23

Summary• Building a new system for executing

code doesn’t require construction of a full compiler

• Cost-effective strategies: source-to-source translation or translation to an existing intermediate code format

• Most material covered in this course is still helpful for these approaches

• High performance: translate to C• Portability, extensibility: translate to Java

or JVM