Upload
open-analytics
View
550
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Visual Revenue's p
Citation preview
Alex Poon VP of Engineering
Storm @ Visual Revenue (an Outbrain Company)
Who are we?
What we do? CustomerTraffic
WebServers
DataTransform/Aggrega8on
Databases
Dashboard Algo
Automa8on
Ka=a
Storm
• 14B page views per month
• At peak, 8000-10000 per sec
• Deployed Storm to production ~ 1 month ago
• Storm cluster of ~50 instances on AWS
Before Storm • Built our own distributed data processing
• ZMQ
• Batch based process
• Hashing processing by customers
• Advantages
• Simple in-house system built from very basic components
• Well understood
• Disadvantages
• Hard to scale, constant battle for keeping up with pings
• Machine management was clumsy
• Uneven distribution of traffic
• Multiple processes doing similar work, wasting resources
Why Kafka/Storm? • Kafka
• open-sourced, distributed publish-subscribe messaging system
• Storm
• open-sourced, real-time computation system for continuous computation
• They are awesome
• Distributed, highly scalable, and fault tolerance
• High throughput
• Reliable
• Real-time
• Great at in-memory analytics, and real-time decision support
Data Aggregation
URL
15s
Aggregate
15s
Customer
15s
Front Page
15s
Position
5m
Arrangement
15s
Tweet
5m
Aggregate
15s
@HandleSpout
Bolt
Learning / Ideas 1. Kafka + zookeeper is extremely scalable and easy to setup. Check out the Brod library if you are doing Python
2. Use the Storm UI (Ganglia based) to monitor your cluster
3. Shell Bolts were inefficient and hard to debug (at least for us)
4. Upgrade to at least Storm version 0.8.2 which gives you capacity metrics on top of other goodies
5. Storm’s anchoring/replay capability is awesome but comes with a visible overhead
6. Use a good framework to manage your cluster, we use Salt Stack
7. Our unit tests are built in Junit. Most built in unit tests for Storm are only available in Clojure for now
Thank You
Alex Poon
@alexpoon06 @Outbrain
Yes, it is true. We are Hiring!!
www.visualrevenue.com/jobs