15
Handling massive traffic with Python Òscar Vilaplana, Paylogic PyGrunn 2013

Handling Massive Traffic with Python

Embed Size (px)

DESCRIPTION

At Paylogic we handle massive online peak sales, with tens of thousands customers coming every second trying to get a chance to buy their ticket. We built a virtual queue to handle this load and sell the tickets in a fair order. This is how we did it (as much as I can tell you!). I presented this talk at PyGrunn 2013.

Citation preview

Page 1: Handling Massive Traffic with Python

Handling massive traffic

with PythonÒscar Vilaplana, Paylogic

PyGrunn 2013

Page 2: Handling Massive Traffic with Python

What’s the problem?

• High Traffic (>10k hits/s)

• Redirect low traffic to Paylogic

• Change redirected TPS

• Expect things to break

• Be fair, respect FIFO (within reason)

• Keep users informed

02

Page 3: Handling Massive Traffic with Python

In more detail

• Open/hold/close sales

• Expect any server to go down

• Expect ALL servers to go down

• Expect users to disappear

• Display expected waiting time and other inf

• Keep it working

• Prevent attacks

03

Page 4: Handling Massive Traffic with Python

How It Works

• A horde of customers appear!

• see a pretty page.

• get a position in the queue.

• page auto-refresh.

• your turn? to the Frontoffice!

• meanwhile info is shown.

• (waiting time, information from event managers…)

04

Page 5: Handling Massive Traffic with Python

Data Storage

• Estimates

• Not much data, stored in the instances and synced.

• Tokens

• A LOT of data!

• way too much to store and sync

• use distributed storage

• (the browsers)

05

Page 6: Handling Massive Traffic with Python

Architecture

• ELB

• Queue Instances

• Bouncer Process

• Syncer Process

• HTML/JS Queue Page in Cloudfront

06

Page 7: Handling Massive Traffic with Python

ELB

• Auto-scales (but not fast enough).

• Many regions.

• Can boot/kill instances automatically.

• We don’t do it yet.

07

Page 8: Handling Massive Traffic with Python

Queue Instances

• EC2 instances, which handle the traffic.

• All identical, sync eachother.

• They can be added or removed at will.

• If some (but not all) die, the users won’t notice.

• If all die, only the statistics will be affected.

• (Never happened).

08

Page 9: Handling Massive Traffic with Python

Users Handler

• Give out and validate tokens.

• Determine if the user should:

• Keep waiting

• Go to the Frontoffice

• See the Sold Out page.

• Return the expected waiting time.

• Return the values configured by the Event Managers.

09

Page 10: Handling Massive Traffic with Python

Synchronization of Statistics

• Keep the Queue Instances synced so they know:

• How many users are waiting.

• How to calculate the waiting time.

• How many users are being let through by the system

10

Page 11: Handling Massive Traffic with Python

HTML/JS Queue Page in Cloudfront

• Uses Handlebars

• Served by Cloudfront so that the Queue keeps looking good even if all

our servers were down.

• Updated frequently.

• Calls the Load Balancer. Error? Retry.

• Errors are very rare.

11

Page 12: Handling Massive Traffic with Python

Deployment

• Debs in private repos.

• Installed through tunnel.

• Custom python2deb tool (to be released).

12

Page 13: Handling Massive Traffic with Python

Stresstest

• Custom client with human-like behaviour.

• Notify amazon!

13

Page 14: Handling Massive Traffic with Python

What we learned

• Debugging distributed apps is hard.

• Last bugs are nasty.

• ELB doesn’t scale fast enough by itself.

14

Page 15: Handling Massive Traffic with Python

Q&A

15