Highly Available Graphite

  • Published on

  • View

  • Download

Embed Size (px)


Initially presented at OpenWest 2014 conference. Graphite and StatsD gather line series data and offer a robust set of APIs to access that data. While the tools are robust, the dashboards are straight from 1992 and alerting off the data is nonexistent. Nark, an opensource project, solves both of these problems. It provides easy to use dashboards and readily available alerts and notifications to users. It has been used in production at Lucid Software for almost a year. Related to Nark are the tools required to make Graphite highly available.


<ul><li>1.GRAPHITE: HIGHLY AVAILABLE Alyssa Stringham &amp; Matthew Barlocker </li></ul> <p>2. About Alyssa Software Developer at Lucid Software Inc BYU graduate with Bachelors in Computer Science I love Playing the carillon and piano Fast-paced board games Hats Traveling Playing foosball 3. About The Barlocker Chief Architect at Lucid Software Inc Bachelors degree from BYU in Computer Science I love to play board games go 4-wheeling wrestle my sons fly airplanes Follow me on nineofclouds.blogspot.com 4. Tools 5. Graphite Graphite is a highly scalable real-time graphing system Initially developed by Chris Davis at Orbitz.com Comprised of 3 related projects Carbon collects and records metrics Whisper Backend storage mechanism Graphite-Web HTTP frontend that displays graphs Written in Python http://graphite.wikidot.com/ https://github.com/graphite-project/ 6. StatsD A network daemon that aggregates statistics for backend services. Developed by Etsy Written in Node.js https://github.com/etsy/statsd/ http://codeascraft.etsy.com/2011/02/15/measure -anything-measure-everything/ 7. HA Receiver Used to make StatsD highly available and scalable. Initially developed by Matthew Barlocker at Lucid Software Inc Written in Node https://github.com/lucidsoftware/statsd-ha-receiver 8. Nark Nark is an alerting and dashboard frontend for Graphite. Under active development by Lucid Software. Written in Scala using the Play! Framework MySQL backed https://github.com/lucidchart/nark 9. Demo 10. Data Flow Overview 11. Data Flows IN Applications report different types of metrics StatsD aggregates metrics Carbon-cache gathers and groups metrics Whisper stores metrics to disk 12. Data Flows OUT User initiates request over HTTP Graphite-web requests information from carbon-cache Carbon-cache reads data from disk using whisper Graphite-web builds graph using data 13. High Availability &amp; Scaling 14. StatsD - Options We can put StatsD in 3 places: On the reporting server Scales as well as your reporting servers do As available as the reporting servers are Cant get vital metrics like stats.production.applications.chart.users.login On a central server Doesnt scale Single point of failure On a load-balanced set of servers AWS ELB doesnt listen on UDP One stat will be aggregated in multiple places 15. StatsD - Solution StatsD with smart- repeater on reporting servers Accepts UDP and sends TCP for reliability Reduces chattiness over the wire Allows aggregation to occur at a centralized location As scalable and available as the application servers 16. StatsD - Solution AWS Elastic Load Balancer distributes traffic to ha-receivers HA-receivers: Duplicate and transform metrics Deliver metrics to correct server for aggregation Are stateless they scale horizontally Are highly available behind the ELB 17. StatsD - Solution HA-receivers pass the data to StatsD StatsD does the final aggregation Every metric has exactly one StatsD destination Aggregated metrics are sent to carbon 18. Carbon &amp; Whisper Carbon and whisper direct data to disk The daemons are stateless except for buffers Carbon consists of multiple daemons Carbon-relay: Direct traffic to other carbon daemons Carbon-aggregator: A mix between carbon-relay and StatsD Carbon-cache: Gather metrics in a buffer, and write them to disk using whisper Whisper is called from carbon-cache, and is short- lived 19. Carbon &amp; Whisper We chose to use sharding Every server holds 1/n metrics, where n = # shards All servers in a shard hold the same data Syncing data requires a single rsync A b-tree of carbon-relays is used to pick a shard Adding new shards is as easy as adding a new node in the b-tree of carbon-relays Retrieving data can be done by checking one server from every shard 20. Carbon &amp; Whisper StatsD sends metrics to the root carbon-relay on localhost Carbon-relay is setup in a binary tree to pick a shard Every metric goes to exactly one shard Every carbon-relay goes to either 1 shard or 2 relays 21. Carbon &amp; Whisper Carbon-cache receives the metrics from the final relay Metrics are written to disk using whisper on localhost Carbon-cache has a last-in-wins policy 22. graphite-web Graphite-web is stateless All state is contained within carbon-cache Reading data out from a highly available, scalable graphite installation is the same as reading from a single server Use the same ELB as the ha-receiver 23. Nark Nark is stateless All state is contained in MySQL and Graphite Nark will be no more highly available than your MySQL and Graphite installations Use an ELB, an autoscale group, and a multi-AZ RDS instance 24. Recap 25. Questions? Feature Requests? Thanks For Your Time 26. Join The Team Building the next generation of collaborative web applications VC funded High growth rate Profitable Graduates from Harvard, MIT, Stanford Former Google, Amazon, Microsoft employees https://www.golucid.co/jobs </p>


View more >