24
Implementation of the Python Bytecode Compiler Jeremy Hylton Google

Implementation of the Python Bytecode Compiler

  • Upload
    betsy

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

Implementation of the Python Bytecode Compiler. Jeremy Hylton Google. What to expect from this talk. Intended for developers Explain key data structures and control flow Lots of code on slides. The New Bytecode Compiler. Rewrote compiler from scratch for 2.5 Emphasizes modularity - PowerPoint PPT Presentation

Citation preview

Page 1: Implementation of the Python Bytecode Compiler

Implementation of the Python Bytecode Compiler

Jeremy HyltonGoogle

Page 2: Implementation of the Python Bytecode Compiler

What to expect from this talk

• Intended for developers• Explain key data structures and control

flow• Lots of code on slides

Page 3: Implementation of the Python Bytecode Compiler

The New Bytecode Compiler

• Rewrote compiler from scratch for 2.5– Emphasizes modularity– Work was almost done for Python 2.4– Still uses original parser, pgen

• Traditional compiler abstractions– Abstract Syntax Tree (AST)– Basic blocks

• Goals– Ease maintenance, extensibility– Expose AST to Python programs

Page 4: Implementation of the Python Bytecode Compiler

Compiler Architecture

Tokenizer

Parser

AST Converter

Code Generator

Assembler

Peephole Optimizer

Source Text Tokens

Parse Tree

AST

__future__ Symbol Table

Blocks

bytecode

bytecode bytecode

Page 5: Implementation of the Python Bytecode Compiler

Compiler Organization

compile.c 4,200

infrastructure 700

code generator 2,400

assembler 500

peephole optimizer 600

asdl.c,.h <100

pyarena.c 100

future.c 100

ast.c 3,000

symtable.c 1,400

Python-ast.c,.h 1,900 (generated)

Total 10,800

Page 6: Implementation of the Python Bytecode Compiler

Tokenize, Parse, AST

• Simple, hand-coded tokenizer– Synthesizes INDENT and DEDENT tokens

• pgen: parser generator– Input in Grammar/Grammar– Extended LL(1) grammar

• ast conversion– Collapses parse tree into abstract form– Future: extend pgen to generator ast directly

Page 7: Implementation of the Python Bytecode Compiler

Grammar vs. Abstract Syntax

compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | funcdef | …if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]suite: simple_stmt | NEWLINE INDENT stmt+ DEDENTtest: and_test ('or' and_test)* | lambdefand_test: not_test ('and' not_test)*not_test: 'not' not_test | comparisoncomparison: expr (comp_op expr)*comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is'|'is' 'not‘

stmt = For(expr target, expr iter, stmt* body, stmt* orelse) | If(expr test, stmt* body, stmt* orelse) | …

expr = BinOp(expr left, operator op, expr right) | Compare(expr left, cmpop* ops, expr* comparators) | Call(expr func, expr* args, keyword* keywords,

expr? starargs, expr? kwargs) | …

Page 8: Implementation of the Python Bytecode Compiler

AST node types

• Modules (mod)• Statements (stmt)• Expressions (expr)

– Expressions allowed on LHS have context slot

• Extras– Slots, comprehension, excepthandler,

arguments– Operator types

• FunctionDef is complex– Children in two namespaces

Page 9: Implementation of the Python Bytecode Compiler

Example Code

L = []for x in range(10):if x > 5:

L.append(x * 2)else:

L.append(x + 2)

Page 10: Implementation of the Python Bytecode Compiler

Concrete Syntax Example

(if_stmt, (1, 'if'), (test, (and_test, (not_test, (comparison, (expr, (xor_expr, (and_expr, (shift_expr, (arith_expr, (term, (factor, (power, (atom, (1, 'x')))))))))), (comp_op, (21, '>')), (expr, (xor_expr, (and_expr, (shift_expr, (arith_expr, (term, (factor, (power, (atom, (2, '5')))))))))))))), (11, ':'), …

Page 11: Implementation of the Python Bytecode Compiler

Abstract Syntax Example

For(Name('x', Load), Call(Name('range', Load), [Num(10)]),

[If(Compare(Name('x', Load), [Lt], [Num(5)]), [Call(Attribute(Name('L', Load), Name('append', Load)), [BinOp(Name('x', Load), Mult, Num(2))])] [Call(Attribute(Name('L', Load), Name('append', Load)), [BinOp(Name('x', Load), Add, Num(2))])])])

Page 12: Implementation of the Python Bytecode Compiler

Our Goal: Bytecode 2 0 BUILD_LIST 0 3 STORE_FAST 1 (L) 3 6 SETUP_LOOP 71 (to 80) 9 LOAD_GLOBAL 1 (range) 12 LOAD_CONST 1 (10) 15 CALL_FUNCTION 1 18 GET_ITER >> 19 FOR_ITER 57 (to 79) 22 STORE_FAST 0 (x)

4 25 LOAD_FAST 0 (x) 28 LOAD_CONST 2 (5) 31 COMPARE_OP 4 (>) 34 JUMP_IF_FALSE 21 (to 58) 37 POP_TOP

5 38 LOAD_FAST 1 (L) 41 LOAD_ATTR 3 (append) 44 LOAD_FAST 0 (x) 47 LOAD_CONST 3 (2) 50 BINARY_MULTIPLY 51 CALL_FUNCTION 1 54 POP_TOP 55 JUMP_ABSOLUTE 19 >> 58 POP_TOP

7 59 LOAD_FAST 1 (L) 62 LOAD_ATTR 3 (append) 65 LOAD_FAST 0 (x) 68 LOAD_CONST 3 (2) 71 BINARY_ADD 72 CALL_FUNCTION 1 75 POP_TOP 76 JUMP_ABSOLUTE 19 >> 79 POP_BLOCK

Page 13: Implementation of the Python Bytecode Compiler

Strategy for Compilation

• Module-wide analysis– Check future statements– Build symbol table

• For variable, is it local, global, free?• Makes two passes over block structure

• Compile one function at a time– Generate basic blocks– Assemble bytecode– Optimize generated code (out of order)– Code object stored in parent’s constant pool

Page 14: Implementation of the Python Bytecode Compiler

Symbol Table

• Collect basic facts about symbols, block– Variables assigned, used; params, global stmts– Check for import *, unqualified exec, yield– Other tricky details

• Identify free, cell variables in second pass– Parent passes bound names down– Child passes free variables up– Implicit vs. explicit global vars

Page 15: Implementation of the Python Bytecode Compiler

Name operations

• Five different load name opcodes– LOAD_FAST: array access for function locals– LOAD_GLOBAL: dict lookups for globals, builtins– LOAD_NAME: dict lookups for locals, globals– LOAD_DEREF: load free variable– LOAD_CLOSURE: loads cells to make closure

• Cells– Separate allocation for mutable variable– Stored in flat closure list– Separately garbage collected

Page 16: Implementation of the Python Bytecode Compiler

Class namespaces

class Spam:id = id(1)

1 0 LOAD_GLOBAL 0 (__name__) 3 STORE_NAME 1 (__module__)

2 6 LOAD_NAME 2 (id) 9 LOAD_CONST 1 (1) 12 CALL_FUNCTION 1 15 STORE_NAME 2 (id) 18 LOAD_LOCALS 19 RETURN_VALUE

Page 17: Implementation of the Python Bytecode Compiler

Closuresdef make_adder(n):

x = ndef adder(y):

return x + yreturn adder

return make_adder

def make_adder(n): 2 0 LOAD_FAST 0 (n) 3 STORE_DEREF 0 (x) 3 6 LOAD_CLOSURE 0 (x) 9 LOAD_CONST 1 (<code>) 12 MAKE_CLOSURE 0 15 STORE_FAST 2 (adder) 5 18 LOAD_FAST 2 (adder) 21 RETURN_VALUE

def adder(y): 4 0 LOAD_DEREF 0 (x) 3 LOAD_FAST 0 (y) 6 BINARY_ADD 7 RETURN_VALUE

Page 18: Implementation of the Python Bytecode Compiler

Code generation input

• Discriminated unions– One for each AST type– Struct for each option– Constructor functions

• Literals– Stored as PyObject*– ast pass parses

• Identifiers– Also PyObject* – string

typedef struct _stmt *stmt_ty;struct _stmt { enum { ..., For_kind=8, While_kind=9, If_kind=10, ... } kind; union { struct { expr_ty target; expr_ty iter; asdl_seq *body; asdl_seq *orelse; } For; struct { expr_ty test; asdl_seq *body; asdl_seq *orelse; } If; } int lineno;};

Page 19: Implementation of the Python Bytecode Compiler

Code generation output

• Basic blocks– Start with jump target– Ends if there is a jump– Function is graph of blocks

• Instructions– Opcode + argument– Jump targets are pointers

• Helper functions– Create new blocks– Add instr to current block

struct instr {unsigned char i_opcode;int i_oparg;struct basicblock_ *i_target; int i_lineno;

// plus some one-bit flags};

struct basicblock_ {int b_iused;int b_ialloc;struct instr *b_instr;struct basicblock_ *b_next;int b_startdepth;int b_offset;// several details elided

};

Page 20: Implementation of the Python Bytecode Compiler

Code generation

• One visitor function for each AST type– Switch on kind enum– Emit bytecodes– Return immediately on error

• Heavy use of C macros– ADDOP(), ADDOP_JREL(), …– VISIT(), VISIT_SEQ(), …– Hides control flow

Page 21: Implementation of the Python Bytecode Compiler

Code generation example

static int compiler_if(struct compiler *c, stmt_ty s) {

basicblock *end, *next;

if (!(end = compiler_new_block(c)))

return 0;

if (!(next = compiler_new_block(c)))

return 0;

VISIT(c, expr, s->v.If.test);

ADDOP_JREL(c, JUMP_IF_FALSE, next);

ADDOP(c, POP_TOP);

VISIT_SEQ(c, stmt, s->v.If.body);

ADDOP_JREL(c, JUMP_FORWARD, end);

compiler_use_next_block(c, next);

ADDOP(c, POP_TOP);

if (s->v.If.orelse)

VISIT_SEQ(c, stmt, s->v.If.orelse);

compiler_use_next_block(c, end);

return 1;

}

Page 22: Implementation of the Python Bytecode Compiler

Assembler

• Lots of fiddly details– Linearize code– Compute stack space needed– Compute line number table (lnotab)– Compute jump offsets– Call PyCode_New()

• Peephole optimizer– Integrated at wrong end of assembler– Constant folding, simplify jumps

Page 23: Implementation of the Python Bytecode Compiler

AST transformation

• Expose AST to Python programmers– Simplify analysis of programs– Generate code from modified AST

• Example:– Implement with statement as AST transform

• Ongoing work– BOF this afternoon at 3:15, Preston Trail

Page 24: Implementation of the Python Bytecode Compiler

Loose ends

• compiler package– Should revise to support new AST types– Tricky compatibility issue

• Revise pgen to generate AST directly• Develop toolkit for AST transforms• Extend analysis, e.g. PEP 267