Inter-Process/Task Communication With Message Queues

  • Upload
    wamcvey

  • View
    3.676

  • Download
    0

Embed Size (px)

Citation preview

Inter-Process/Task Communication With Message Queues

William McVey PyOhio
July 26, 2009

Intro

How I found a solution that works well for me

There is a LOT of material out there that isn't covered

Not necessarily ideal solution, but I learned a lot along the way

Description of the Problem

HPC Controller:Tries to discover new ways web browsers (and other client software) get exploited "in the wild" and ensures that my employer's mitigations for these threats are effective.

A Django-based data management application

Invokes long running Capture-HPC Java application

Collects and processes large amounts of data

Architecture

Key Difficulties

Long running processes under short lived web requests.

My initial (naive) approach:

Spawn detached processes to handle jobs

Process coordination via database

Lesson learned

Do not screw with Apache's process model.

Rediscovering Queues

Basic queue overview

Standard lib:

Queue - mostly for thread pool management

collections.dequeue - provides efficient access to both endpoints of list structure

heapq - ordered queues (e.g. priority queue)

Generic message broker

Message brokers can provide:

Simple queue-like dataflow

Simplified interprocess communication with message routing

More effective scaling

Better resilience to failure

beanstalkd/beanstalkc

beanstalkd:A very simple text based-protocol with an simple yet powerful set of queue management primitives. http://xph.us/software/beanstalkd/

beanstalkc:A simple yet powerful client API that is well documented. http://github.com/earl/beanstalkc/

[demo here]

The need for something more

Beanstalkd continues to be effective for hpc_controller. A new project came along and I ran into some issues...

Lack of authentication

Lack of message integrity/confidentiality

Lack of persistent messages

memcacheq

Memcacheq uses the memcachedb protocol to implement queues. "Cache" look up of a queue name pop a value from the queue

Pro:

Fast, lightweight, and scales well.

Persistent messages across reboots

Con:

Doesn't support either blocking or callback interfaces

Have to poll to see if you have messages

Didn't address authentication requirement

[demo here]

AMQP

Advanced Message Queuing Protocol (AMQP) open protocol layer for message queues.

Pro:

A more powerful message routing capability

TLS (aka SSL) as part of the protocol spec

A variety of broker implementations

Con:

More complex

AMQP

AMQP Message Routing

Image from: Messaging Tutorial - AMQP Programming Tutorial for C++, Java, Python, and C#Copyright 2008 Red Hat, Inc. Under the Open Publication License

MQ - http://zeromq.org/

High performance messaging broker which can speak AMQP or you can use it's own set of python bindings to communicate via the library code.

Pro:

more flexible set of possible topologies (include brokerless/peer to peer, directory referral, and more).

Con:

Misguided 'fail fast' implementation within the library

RabbitMQ

RabbitMQ is conformant to the AMQP spec and provided the features I needed:

TLS protected communication

Authentication / Authorization

High reliability

Persistent messages

Broker is implemented in Erlang, but implementation doesn't matter since client side has py-amqplib.

amqplib / carrot

py-amqplib is a client library around the AMQP protocol.Fairly low level for my needs though, so a little digging found carrot

carrot sample

>>> from carrot.messaging import Publisher, Consumer
>>> class PostOfficePublisher(Publisher):
... exchange = "sorting_room"
... routing_key = "jason"

>>> class PostOfficeConsumer(Consumer):
... queue = "po_box"
... exchange = "sorting_room"
... routing_key = "jason"
...
... def receive(self, message_data, message):
... """Called when we receive a message."""
... print("Received: %s" % message_data)

carrot sample

>>> from ConfigParser import ConfigParser
>>> config = ConfigParser()
>>> config.read("application.ini")

>>> from carrot.connection import AMQPConnection
>>> amqpconn = AMQPConnection(
... hostname = config.get("broker", "host"),
... port = config.get("broker", "port"),
... userid = config.get("broker", "userid"),
... password = config.get("broker", "password"),
... vhost = config.get("broker", "vhost"))

>>> PostOfficePublisher(connection=amqpconn).send(
... {"My message": ["foo", "bar", "baz"]})

>>> PostOfficeConsumer(connection=amqpconn).next()
Received: {"My message": ["foo", "bar", "baz"]}

multiprocessing

Part of the Python 2.6 standard library.Main intent is to provide a process alternative to the threadingQueueManager library.Provides some process coordination facilities, including a object and a network aware interprocess object.

Pro:

Part of standard library (2.6 and beyond)

Con:

Pretty low level

In Summary

I like beanstalkc.

I like AMQP (specifically RabbitMQ) along with carrot API

Memcacheq would work well if all you need to do is cache jobs until you can process in batch

Multiprocessing in worth a look

I've only scratched the surface (Kamaelia, sprinkle/STOMP, etc)