POSH Python Object Sharing

POSHPython Object Sharing

Steffen Viken ValvågIn collaboration with

Kjetil Jacobsen &Åge Kvalnes

University of Tromsø, NorwaySponsored byFast Search & Transfer

Python Execution Model

for x in "TEST PROGRAM": if x not in "FORGET IT": print x,

Test.py

0 SETUP_LOOP 3 LOAD_CONST ('TEST PROGRAM') 6 GET_ITER 7 FOR_ITER10 STORE_FAST (x)13 LOAD_FAST (x)16 LOAD_CONST ('FORGET IT')19 COMPARE_OP (not in)22 JUMP_IF_FALSE (to 33)25 POP_TOP26 LOAD_FAST (x)29 PRINT_ITEM30 JUMP_FORWARD (to 34)33 POP_TOP34 JUMP_ABSOLUTE37 POP_BLOCK

Test.pyc

OutputSPAM

Byte-code compilation

Interpretation

Python Threading ModelThread A

Bytecodes

Thread B

GIL

Each thread executes a separate sequence of byte codes All threads must contend for one global interpreter lock

Example: Matrix Multiplication

Performs a matrix multiplication A = B x C

The work is split between several worker threads

The application runs on a machine with 8 CPUs

0

100

200

300

400

500

1 2 3 4 5 6 7 8Number of workers

Tim

e IdealThreads

Threads do not scale for multiple CPUs due to lock contention on the GIL

Workaround: Processes

Process A Process B

IPC

Each process has its own interpreter lock Requires inter-process communication, using e.g.

message passing by means of pipes

Matrix Multiplication using Processes

A master process distributes the input matrices to a set of worker processes

Each worker process computes some part of the output matrix, and returns its result to the master

The master process assembles the final result matrix

More communication, and more complex pattern than using threads

Ways Ahead

Communication through standard Python container objects favors threads

Scalability on multiprocessor architectures favors processes

The GIL is here to stay, so making threads scale better is hard

However, there might be room for improvement of inter-process communication mechanisms

Using Shared Memory for IPC

Process B

Shared Memory

Process A

Processes communicate by accessing a shared memory region

Requires explicit synchronization and data marshalling, imposes a flat data structure

Using POSH for IPC

Process B

Shared Memory

Process A

Allocates regular Python objects in shared memory Shared objects are accessed transparently through

regular method calls

XL

X.method1()

L.extend([X, Y])

Y

IPC is done by modifying shared, mutable objects

Complications

Processes must synchronize their access to shared, mutable objects (just like threads)

Explicit synchronization of critical regions must be possible, while implicit synchronization upon accessing shared objects is desireable

Python’s regular garbage collection algorithm is inadequate for shared objects, which may be referenced by multiple processes

Proxy Objects

Shared ObjectProxy Object

X

Provides transparent access to a shared object by forwarding all attribute accesses and method calls

Provides a single entry point to a shared object, where synchronization policies may be enforced

X.method1()

return value return value

X.method2()

Multi-Process Garbage Collection

Must account for references from all live processes

Must stay up-to-date when processes fork, as this may create new references to shared objects

Should be able to handle abnormal process termination without leaking shared objects

Garbage Collection in POSH

Process A Shared Memory Process B

X

Y L

Shared Object Proxy Object

M

Regular Python referenceReference from a process to a shared object (type I)Reference from one shared object to another (type II)

Garbage Collection Details

POSH creates at most one proxy object per process for any given shared object

Shared objects are always referenced through their proxy objects

A bitmap in each shared object records the processes that have a corresponding proxy object. This tracks references of type I (from a process)

A separate count in each shared object records the number of references to the object from other shared objects. This tracks references of type II

Shared objects are deleted when there are no references to them of either type

Performance

Performing a matrix multiplication A = B x C using POSH

The work is split between several worker processes

The application runs on a machine with 8 CPUs

More overhead, but scales for multiple CPUs

0

100

200

300

400

500

1 2 3 4 5 6 7 8Number of workers

Tim

e IdealThreadsPOSH

Summary

Python uses a global interpreter lock (GIL) to serialize execution of byte codes

This entails a lack of scalability on multiprocessor architectures for CPU-intensive multi-threaded apps

However, threads offer an attractive programming model, with implicit communication

Processes + shared memory reduce IPC overheads, but normally impose flat data structures and require data marshalling

POSH uses processes + shared memory to offer a programming model similar to threads, with the scalability of processes

Availability

Open source, hosted at SourceForge http://poshmodule.sf.net/ Still not very stable Developers wanted

http://poshmodule.sf.net/

Example Usage

import posh

class Stuff(object): pass

posh.allow_sharing(Stuff, posh.generic_init)

mystuff = posh.share(Stuff())

def worker1(): mystuff.money = 0def worker2(): mystuff.debt = 100000def worker3(): mystuff.balance = mystuff.money - mystuff.debt

for w in worker1, worker2, worker3: posh.forkcall(w)posh.waitall()

Documents

POSH Python Object Sharing