Upload
alexgaynor
View
4.766
Download
1
Embed Size (px)
DESCRIPTION
A talk given at PyOhio 2010 (and to the Chicago Python Users group) on PyPy and Unladen Swallow.
Citation preview
PyPy and Unladen Swallow - Making
Python Fast
Friday, July 30, 2010
Why is Python slow?
Why is CPython slow?
What are we going to do about it?
Friday, July 30, 2010
Python the abstract language, not the implementation.
Very dynamic.
Almost nothing known at compile time.
Frame introspection.
Object model.
Globals/Builtins
Why is Python Slow
Friday, July 30, 2010
Frame Introspectionimport sys
def f(): a = 3 g()
def g(): try: raise Exception except Exception, e: frame = sys.exc_info()[2].tb_frame print frame.f_back.f_locals["a"]
f()
Friday, July 30, 2010
Object Modelclass A(object): def __init__(self, **kwargs): self.__dict__.update(kwargs)
o = A(a=1, b=2)print o.a
Friday, July 30, 2010
Dynamic
def f(a, b): print a + b
Friday, July 30, 2010
Globals/Builtins
def f(l): yield len(l) yield len(l)
for i in f([3]): print i len = lambda o: 3
Friday, July 30, 2010
Why is CPython Slow
“Primitive” bytecode VM
Value boxing
Reference counting
Friday, July 30, 2010
What Are We Going To Do About It
Unladen Swallow
PyPy
Friday, July 30, 2010
Unladen Swallow
Google funded branch of Python
Started out off of Python 2.6
PEP 3146 - Merging Unladen Swallow into Py3k
LLVM based function JIT
Friday, July 30, 2010
LLVM
Low Level Virtual Machine
Not a VM like CPython.
Take Python representation of a function and turn into LLVM representation of a function and generate machine code.
Includes all sorts of optimizations and code generators.
Friday, July 30, 2010
JIT
Profile and see which functions are called the most.
Record what types are seen for each operation.
Emit optimized machine code (that bails back to the interpreter if guards fail).
Friday, July 30, 2010
PyPy
Python in Python
JIT Generator
Tracing JIT
Friday, July 30, 2010
RPython
Restricted Python
Statically typed subset of Python
Can be efficiently converted to C, JVM bytecode, CIL (.NET bytecode)
Friday, July 30, 2010
JIT Generator
Take an interpreter written in RPython
Add a few hints to the source code
Automatically generate a JIT for it
Friday, July 30, 2010
Tracing JIT
Profile code looking for hot loops
Record types seen within a loop
Generated optimized machine code for loops
Friday, July 30, 2010
Benchmarks
The Python Benchmark Suite
Extracted from Unladen Swallow
Used by CPython, Unladen Swallow, PyPy
Friday, July 30, 2010
CPython vs Unladen Swallow
Benchmark CPython 2.6 Unladen Swallow Difference
2to3
django
html5lib
nbody
rietveld
slowpickle
slowspitfire
slowunpickle
spambayes
25.13s 24.87 s 1.01
1.08 s 0.80 s 1.35
14.29 s 13.20 s 1.08
0.51 s .28 s 1.84
0.75 s 0.55 s 1.37
0.75 s 0.55 s 1.37
0.83 s 0.61 s 1.36
0.33 s 0.26 s 1.26
0.31 s 0.34 s 1.10
Friday, July 30, 2010
CPython vs PyPy
Friday, July 30, 2010
Faster is Possible
Friday, July 30, 2010
Global/Builtin Lookup Caching
Loading a global takes 1 dictionary lookup.
Loading a builtin takes 2.
Globals/Builtins rarely, if ever, change.
Friday, July 30, 2010
PyPy
Uses a dictionary for modules similar to V8 hidden classes.
Check that the dict has the right shape.
Read the field directly out of it.
Friday, July 30, 2010
In Unladen Swallow
When the JIT compiler sees a LOAD_GLOBAL opcode it does the lookup at compile time, writes the exact address of the value into the machine code, and registers a listener with the globals/builtins dictionary.
If the globals/builtins dictionary is written to it invalidates the machine code.
Friday, July 30, 2010
Inlining
Good programming practice is to split up functions.
Function calls are expensive.
Also, calls across the Python interpreter/C (or other target language) barrier are expensive.
Remove argument parsing, frame, and “raw” function call overhead.
Friday, July 30, 2010
In Unladen Swallow
This hasn’t landed in trunk yet.
When compiling code if a CALL_FUNCTION always points to the same function check how “expensive” that function is, if it’s low copy its bytecode into our bytecode.
Also, at CPython compile time turn all library functions into LLVM IR, so we can inline that as well.
Friday, July 30, 2010
In PyPy
Tracing JIT automatically goes through all function calls.
Final operations list automatically has all calls inlined.
Library functions are compiled to jitcode (the JIT’s IR) at PyPy compile time, so they can be inlined too.
Friday, July 30, 2010
Questions?Complaints? Thrown
Vegetables?
Friday, July 30, 2010