60
Adam Hitchcock @NorthIsUp Scaling Realtime at DISQUS Sunday, 17 March, 13

Scaling Realtime at DISQUS

Embed Size (px)

DESCRIPTION

Scaling Realtime at DISQUS

Citation preview

Page 1: Scaling Realtime at DISQUS

Adam Hitchcock@NorthIsUp

Scaling Realtime at DISQUS

Sunday, 17 March, 13

Page 2: Scaling Realtime at DISQUS

Sunday, 17 March, 13

Page 3: Scaling Realtime at DISQUS

Adam Hitchcock@NorthIsUp

Scaling Realtime at DISQUS

Sunday, 17 March, 13

Page 4: Scaling Realtime at DISQUS

we’re hiringdisqus.com/jobs

If this is interesting to you...

Sunday, 17 March, 13

Page 5: Scaling Realtime at DISQUS

what is DISQUS?

Sunday, 17 March, 13

Page 6: Scaling Realtime at DISQUS

Sunday, 17 March, 13

Page 7: Scaling Realtime at DISQUS

why do realtime?

๏ getting new data to the user asap๏ for increased engagement๏ and it looks awesome๏ and we can sell (or trade) it

Sunday, 17 March, 13

Page 9: Scaling Realtime at DISQUS

DISQUS sees a lot of traffic

Google Analytics: Feb 2013 - March 2012

Sunday, 17 March, 13

Page 10: Scaling Realtime at DISQUS

realertime

๏ currently active on all DISQUS sites

๏ tested ‘dark’ on our existing network๏ during testing:

๏ 1.5 million concurrently connected users๏ 45 thousand new connections per second๏ 165 thousand messages/second๏ <.2 seconds latency end to end

Sunday, 17 March, 13

Page 11: Scaling Realtime at DISQUS

so, how did we do it?

Sunday, 17 March, 13

Page 12: Scaling Realtime at DISQUS

Node.js and MongoDB!

Sunday, 17 March, 13

Page 13: Scaling Realtime at DISQUS

Node.js and MongoDB!

Sunday, 17 March, 13

Page 14: Scaling Realtime at DISQUS

This is PyCon.We used Python.

Sunday, 17 March, 13

Page 15: Scaling Realtime at DISQUS

and some otherTechnology You Know™

Sunday, 17 March, 13

Page 16: Scaling Realtime at DISQUS

thoonk redis queuesome python gluenginx push streamand long(er) polling

Sunday, 17 March, 13

Page 17: Scaling Realtime at DISQUS

architecture overview

Sunday, 17 March, 13

Page 18: Scaling Realtime at DISQUS

old-june

memcache

New Posts memcache

DISQUS embed clients

DISQUS

poll memcacheever 5 seconds

Sunday, 17 March, 13

Page 19: Scaling Realtime at DISQUS

june-july

redis pub/sub

New Posts redis pub/sub

DISQUS embed clients

DISQUS

HA Proxy

Flask FEcluster

Sunday, 17 March, 13

Page 20: Scaling Realtime at DISQUS

HA Proxy

july-october

Flask FEcluster

redis queue

“python glue”Gevent server

New Posts redis pub/sub

DISQUS embed clientsredis pub/sub

DISQUS

“python glue”Gevent server

Sunday, 17 March, 13

Page 21: Scaling Realtime at DISQUS

HA Proxy

august-october

Flask FEcluster

redis queue

“python glue”Gevent server

New Posts redis pub/sub

DISQUS embed clientsredis pub/sub

DISQUS

“python glue”Gevent server

2

14 BIG 6 servers

5 servers

Sunday, 17 March, 13

Page 22: Scaling Realtime at DISQUS

HA Proxy

august-october

Flask FEcluster

redis queue

“python glue”Gevent server

New Posts redis pub/sub

DISQUS embed clientsredis pub/sub

DISQUS

“python glue”Gevent server

2

6 servers

5 servers

2 for

14 BIG lots of servers,we can do better

Sunday, 17 March, 13

Page 23: Scaling Realtime at DISQUS

“python glue”Gevent server

october-now

nginx+

push streammodule

redis queue

New Posts ngnix pub endpoint

DISQUS embed clientshttp post

DISQUS

Sunday, 17 March, 13

Page 24: Scaling Realtime at DISQUS

“python glue”Gevent server

october-now

nginx+

push streammodule

redis queue

New Posts ngnix pub endpoint

DISQUS embed clientshttp post

DISQUS

2

5

Why still 5 for this?Network memory restriction, we

can’t fix this without kernel hacking, tweaking, etc.

(if you know how, tell us, then apply for a job, then fix it for us)

Sunday, 17 March, 13

Page 25: Scaling Realtime at DISQUS

october-now

django

Formatter

Publishers

thoonk queue

http post

ngnix pub endpoint

DISQUS embed clientsother realtime

stuff

nginx+

push streammodule

New Posts

Sunday, 17 March, 13

Page 26: Scaling Realtime at DISQUS

thoonk redis queuesome python gluenginx push streamand long(er) polling

Sunday, 17 March, 13

Page 27: Scaling Realtime at DISQUS

the thoonk queue

๏ django post_save and post_delete hooks๏ thoonk is a queue on top of redis๏ implemented as a DFA๏ provides job semantics

๏ useful for end to end acking๏ reliable job processing in distributed system

๏ did I mention it’s on top of redis?๏ uses zset to store items == ranged queries

Sunday, 17 March, 13

Page 28: Scaling Realtime at DISQUS

thoonk redis queuesome python gluenginx push streamand long(er) polling

Sunday, 17 March, 13

Page 29: Scaling Realtime at DISQUS

the python glue

๏ listens to a thoonk queue๏ cleans & formats message

๏ this is the final format for end clients

๏ compress data now๏ publish message to nginx and

other firehoses๏ forum:id, thread:id, user:id,

post:id

Formatter

Publishers

Sunday, 17 March, 13

Page 30: Scaling Realtime at DISQUS

gevent is nice

# the code is too big to show here, so just import it# http://bitly.com/geventspawn

from realertime.lib.spawn import Watchdogfrom realertime.lib.spawn import TimeSensitiveBackoff

Sunday, 17 March, 13

Page 31: Scaling Realtime at DISQUS

data pipelines

class Pipeline(object): def parse_data(self, data): raise NotImplemented('No ParserMixin used')

def compute_data(self, data, parsed_data): raise NotImplemented('No ComputeMixin used')

def publish_data(self, data, parsed_data, computed_data): raise NotImplemented('No PublisherMixin used')

def handle(self, data): parsed_data = self.parse_data(data) computed_data = self.compute_data(data, parsed_data) return self.publish_data(data, parsed_data, computed_data)

Sunday, 17 March, 13

Page 32: Scaling Realtime at DISQUS

Example Mixinsclass JSONParserMixin(Pipeline): def parse_data(self, data): return json.loads(data)

class AnnomizeDataMixin(Pipeline): def parse_data(self, data, parsed_data): return {}

class SuperSecureEncryptDataMixin(Pipeline): def parse_data(self, data, parsed_data): return parsed_data.encode('rot13')

class HTTPPublisher(Pipeline): def publish(self, data, parsed_data, computed_data): u = urllib2.urlopen(self.dat_url, computed_data) return u

class FilePublisher(Pipeline): def publish(self, data, parsed_data, computed_data): with open(self.output, 'a') as f: f.write(computed_data)

Sunday, 17 March, 13

Page 33: Scaling Realtime at DISQUS

Finished Pipeline

class JSONAnnonHTTPPipeline( JSONParserMixin, AnnomizeDataMixin, HTTPPublisherMixin): pass

class JSONSecureHTTPPipeline( JSONParserMixin, SuperSecureEncyptionMixin, HTTPPublisherMixin): pass

class JSONAnnonFilePipeline( JSONParserMixin, AnnomizeDataMixin, FilePublisherMixin): pass

Sunday, 17 March, 13

Page 34: Scaling Realtime at DISQUS

real live DISQUS codeclass FEOrbitalNginxMultiplexer(

SchemaTransformerMixin, JSONFormatterMixin, SelfChannelsMixin, HTTPPublisherMixin):

def __init__(self, domains, api_version=1): schema_namespace = 'orbital' self.channels = ('orbital', )

super(FEOrbitalNginxMultiplexer, self).__init__(domains=domains, api_version=api_version, schema_namespace=schema_namespace)

class FEPublicAckingMultiplexer( PublicTransformerMixin, JSONFormatterMixin, FEChannelsMixin, ThoonkQueuePubSubPublisherMixin):

def __init__(self, domains, api_version): schema_namespace = 'general' super(FEPublicAckingMultiplexer, self).__init__(domains=domains, api_version=api_version, schema_namespace=schema_namespace)

Sunday, 17 March, 13

Page 35: Scaling Realtime at DISQUS

thoonk redis queuesome python gluenginx push streamand long(er) polling

Sunday, 17 March, 13

Page 36: Scaling Realtime at DISQUS

nginx push stream

๏ follow John Watson (@wizputer) for updated #humblebrags as we ramp up traffic

๏ an example config can be found here:http://bit.ly/disqus-nginx-push-stream

http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13

Page 37: Scaling Realtime at DISQUS

nginx push stream

๏ Replaced webservers and Redis Pub/Sub๏ But starting with Pub/Sub was important for

us๏ Encouraged us to over publish on keys

Sunday, 17 March, 13

Page 38: Scaling Realtime at DISQUS

nginx push stream

๏ Turned on for 70% of our network...๏ ~950K subscribers (peak single machine)๏ peak 40 MBytes/second (per machine)๏ CPU usage is still well under 15%

๏ 99.845% active writes (the socket is written to often enough to come up as ACTIVE)

http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13

Page 39: Scaling Realtime at DISQUS

config push stream

location = /pub { allow 127.0.0.1; deny all;

push_stream_publisher admin; set $push_stream_channel_id $arg_channel;}

location ^~ /sub/ { # to maintain api compatibility we need this location ~ /sub/(.*)/(.*)$ { # Url encoding things? $1%3A2$2 set $push_stream_channels_path $1:$2;

push_stream_subscriber streaming; push_stream_content_type application/json; }}

http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13

Page 40: Scaling Realtime at DISQUS

examples

# Subscurl -s 'localhost/sub/forum/cnn'curl -s 'localhost/sub/thread/907824578'curl -s 'localhost/sub/user/northisup'

# Pubscurl -s -X POST 'localhost/pub?channel=forum:cnn' \ -d '{"some sort": "of json data"}'

curl -s -X POST 'localhost/pub?channel=thread:907824578' \ -d '{"more": "json data"}'

curl -s -X POST 'localhost/pub?channel=user:northisup' \ -d '{"the idea": "I think you get it by now"}'

http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13

Page 41: Scaling Realtime at DISQUS

measure nginx

location = /push-stream-status { allow 127.0.0.1; deny all;

push_stream_channels_statistics; set $push_stream_channel_id $arg_channel;}

http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13

Page 42: Scaling Realtime at DISQUS

thoonk redis queuesome python gluenginx push streamand long(er) polling

Sunday, 17 March, 13

Page 43: Scaling Realtime at DISQUS

long(er) polling

onProgress: function () { var self = this; var resp = self.xhr.responseText; var advance = 0; var rows;

// If server didn't push anything new, do nothing. if (!resp || self.len === resp.length) return;

// Server returns JSON objects, one per line. rows = resp.slice(self.len).split('\n');

_.each(rows, function (obj) { advance += (obj.length + 1); obj = JSON.parse(obj); self.trigger('progress', obj); }); self.len += advance;}

Sunday, 17 March, 13

Page 44: Scaling Realtime at DISQUS

Soon... EventSource

// Currently EventSource has CORS issuesev = EventSource(dat_url);ev.addEventListener("Post", handlePostEvent);

Sunday, 17 March, 13

Page 45: Scaling Realtime at DISQUS

test, measure, repeat

Sunday, 17 March, 13

Page 46: Scaling Realtime at DISQUS

test

๏ Darktime๏ use existing network to load test๏ (user complaints when it didn’t work...)

๏ Darkesttime๏ load testing a single thread

๏ have knobs you can twiddle

Sunday, 17 March, 13

Page 47: Scaling Realtime at DISQUS

measure

๏ measure all the things!๏ especially when the numbers don’t line up๏ measuring is hard in distributed systems๏ try to express things as +1 and -1 if you

can๏ Sentry for measuring exceptions

Sunday, 17 March, 13

Page 48: Scaling Realtime at DISQUS

pretty graphs

Sunday, 17 March, 13

Page 49: Scaling Realtime at DISQUS

how does it really scale?

POPE

white smokefrancis announced

Sunday, 17 March, 13

Page 50: Scaling Realtime at DISQUS

maths

Sunday, 17 March, 13

Page 51: Scaling Realtime at DISQUS

it’s been a busy few weeks

Sunday, 17 March, 13

Page 52: Scaling Realtime at DISQUS

wha?

๏ People do weird stuff with your stuff๏ turned off this server in Oct 2012๏ Still getting 100 req/sec

Sunday, 17 March, 13

Page 53: Scaling Realtime at DISQUS

lessons

๏ do hard (computation) work early๏ end-to-end acks are good, but expensive๏ redis/nginx pubsub is effectively free

Sunday, 17 March, 13

Page 54: Scaling Realtime at DISQUS

If this was interesting to you...

psst, we’re hiringdisqus.com/jobs

Sunday, 17 March, 13

Page 55: Scaling Realtime at DISQUS

special thanks

๏ the team at DISQUS๏ like jeff a.k.a. @nfluxx who had to review all

my code๏ and especially our dev-ops guys๏ like john watson a.k.a. @wizputer who

found the nginx-push-stream module

psst, we’re hiringdisqus.com/jobs

Sunday, 17 March, 13

Page 56: Scaling Realtime at DISQUS

slide full o’ links

๏ Nginx push stream modulehttp://wiki.nginx.org/HttpPushStreamModule

๏ Thoonk (redis queue)http://github.com/andyet/thoonk.py

๏ Sentry (distributed traceback aggregation)http://github.com/dcramer/sentry

๏ Gevent (python coroutines and greenlets)http://gevent.org/

๏ Scales (in-app metrics)http://github.com/Greplin/scales

code.disqus.com

Sunday, 17 March, 13

Page 57: Scaling Realtime at DISQUS

Come find me here!PyCon 2013

Santa Clara Convention CenterHall A-B

Santa Clara, CA

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

NOTE: - ALL BOOTHS ARE 10’x10’ UNLESS NOTED - (50) 10’x15’ BOOTHS - (64) 10’x10’ BOOTHS - (2) 10’x20’ BOOTH - (1) 8’x20’ BOOTH - ALL AISLES ARE 10’ UNLESS NOTED

20’ 20’

8’ 8’

LUNCH&

BREAKS

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

8’

20’ 20’

10’20’

19’

Revised 1/9/2013

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’10’x15’

10’x20’

10’x15’

10’x15’

10’x20’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

8’x20’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

10’x15’

Sunday, 17 March, 13

Page 58: Scaling Realtime at DISQUS

we are still hiring

psst, we’re hiringdisqus.com/jobs

Sunday, 17 March, 13

Page 59: Scaling Realtime at DISQUS

Questions I have

๏ What is the best kernel config for webscale concurrency. Nginx?

๏ I <3 gevent, but what if I want to pypy?๏ Nginx + lua? Seems kind of awesome.๏ Composing data pipelines: good or bad?๏ I didn’t have time to mention:

๏ Kafka, what is it good for?๏ Seriously, why not RabbitMQ?

Sunday, 17 March, 13

Page 60: Scaling Realtime at DISQUS

Adam Hitchcock@NorthIsUp

DISQUSsion?

Sunday, 17 March, 13