If you can't read please download the document
Upload
wamcvey
View
3.676
Download
0
Embed Size (px)
Citation preview
Inter-Process/Task Communication With Message Queues
William McVey PyOhio
July 26, 2009
Intro
How I found a solution that works well for me
There is a LOT of material out there that isn't covered
Not necessarily ideal solution, but I learned a lot along the way
Description of the Problem
HPC Controller:Tries to discover new ways web browsers (and other client software) get exploited "in the wild" and ensures that my employer's mitigations for these threats are effective.
A Django-based data management application
Invokes long running Capture-HPC Java application
Collects and processes large amounts of data
Architecture
Key Difficulties
Long running processes under short lived web requests.
My initial (naive) approach:
Spawn detached processes to handle jobs
Process coordination via database
Lesson learned
Do not screw with Apache's process model.
Rediscovering Queues
Basic queue overview
Standard lib:
Queue - mostly for thread pool management
collections.dequeue - provides efficient access to both endpoints of list structure
heapq - ordered queues (e.g. priority queue)
Generic message broker
Message brokers can provide:
Simple queue-like dataflow
Simplified interprocess communication with message routing
More effective scaling
Better resilience to failure
beanstalkd/beanstalkc
beanstalkd:A very simple text based-protocol with an simple yet powerful set of queue management primitives. http://xph.us/software/beanstalkd/
beanstalkc:A simple yet powerful client API that is well documented. http://github.com/earl/beanstalkc/
[demo here]
The need for something more
Beanstalkd continues to be effective for hpc_controller. A new project came along and I ran into some issues...
Lack of authentication
Lack of message integrity/confidentiality
Lack of persistent messages
memcacheq
Memcacheq uses the memcachedb protocol to implement queues. "Cache" look up of a queue name pop a value from the queue
Pro:
Fast, lightweight, and scales well.
Persistent messages across reboots
Con:
Doesn't support either blocking or callback interfaces
Have to poll to see if you have messages
Didn't address authentication requirement
[demo here]
AMQP
Advanced Message Queuing Protocol (AMQP) open protocol layer for message queues.
Pro:
A more powerful message routing capability
TLS (aka SSL) as part of the protocol spec
A variety of broker implementations
Con:
More complex
AMQP
AMQP Message Routing
Image from: Messaging Tutorial - AMQP Programming Tutorial for C++, Java, Python, and C#Copyright 2008 Red Hat, Inc. Under the Open Publication License
MQ - http://zeromq.org/
High performance messaging broker which can speak AMQP or you can use it's own set of python bindings to communicate via the library code.
Pro:
more flexible set of possible topologies (include brokerless/peer to peer, directory referral, and more).
Con:
Misguided 'fail fast' implementation within the library
RabbitMQ
RabbitMQ is conformant to the AMQP spec and provided the features I needed:
TLS protected communication
Authentication / Authorization
High reliability
Persistent messages
Broker is implemented in Erlang, but implementation doesn't matter since client side has py-amqplib.
amqplib / carrot
py-amqplib is a client library around the AMQP protocol.Fairly low level for my needs though, so a little digging found carrot
carrot sample
>>> from carrot.messaging import Publisher,
Consumer
>>> class PostOfficePublisher(Publisher):
... exchange = "sorting_room"
... routing_key = "jason"
>>> class PostOfficeConsumer(Consumer):
... queue = "po_box"
... exchange = "sorting_room"
... routing_key = "jason"
...
... def receive(self, message_data, message):
... """Called when we receive a message."""
... print("Received: %s" % message_data)
carrot sample
>>> from ConfigParser import ConfigParser
>>> config = ConfigParser()
>>> config.read("application.ini")
>>> from carrot.connection import AMQPConnection
>>> amqpconn = AMQPConnection(
... hostname = config.get("broker", "host"),
... port = config.get("broker", "port"),
... userid = config.get("broker", "userid"),
... password = config.get("broker", "password"),
... vhost = config.get("broker", "vhost"))
>>>
PostOfficePublisher(connection=amqpconn).send(
... {"My message": ["foo", "bar", "baz"]})
>>>
PostOfficeConsumer(connection=amqpconn).next()
Received: {"My message": ["foo", "bar", "baz"]}
multiprocessing
Part of the Python 2.6 standard library.Main intent is to provide a process alternative to the threadingQueueManager library.Provides some process coordination facilities, including a object and a network aware interprocess object.
Pro:
Part of standard library (2.6 and beyond)
Con:
Pretty low level
In Summary
I like beanstalkc.
I like AMQP (specifically RabbitMQ) along with carrot API
Memcacheq would work well if all you need to do is cache jobs until you can process in batch
Multiprocessing in worth a look
I've only scratched the surface (Kamaelia, sprinkle/STOMP, etc)