26
GRAPHITE: HIGHLY AVAILABLE Alyssa Stringham & Matthew Barlocker

Highly Available Graphite

Embed Size (px)

DESCRIPTION

Initially presented at OpenWest 2014 conference. Graphite and StatsD gather line series data and offer a robust set of APIs to access that data. While the tools are robust, the dashboards are straight from 1992 and alerting off the data is nonexistent. Nark, an opensource project, solves both of these problems. It provides easy to use dashboards and readily available alerts and notifications to users. It has been used in production at Lucid Software for almost a year. Related to Nark are the tools required to make Graphite highly available.

Citation preview

Page 1: Highly Available Graphite

GRAPHITE:

HIGHLY AVAILABLE

Alyssa Stringham & Matthew Barlocker

Page 2: Highly Available Graphite

About Alyssa

Software Developer at Lucid Software Inc

BYU graduate with Bachelors in Computer Science

I love

Playing the carillon and piano

Fast-paced board games

Hats

Traveling

Playing foosball

Page 3: Highly Available Graphite

About “The Barlocker”

• Chief Architect at Lucid Software Inc

• Bachelors degree from BYU in Computer Science

• I love to

• play board games

• go 4-wheeling

• wrestle my sons

• fly airplanes

• Follow me on nineofclouds.blogspot.com

Page 4: Highly Available Graphite

Tools

Page 5: Highly Available Graphite

Graphite

Graphite is a highly scalable real-time graphing system

Initially developed by Chris Davis at Orbitz.com

Comprised of 3 related projects

Carbon – collects and records metrics

Whisper – Backend storage mechanism

Graphite-Web – HTTP frontend that displays graphs

Written in Python

http://graphite.wikidot.com/

https://github.com/graphite-project/

Page 7: Highly Available Graphite

HA Receiver

Used to make StatsD highly available and scalable.

Initially developed by Matthew Barlocker at Lucid

Software Inc

Written in Node

https://github.com/lucidsoftware/statsd-ha-receiver

Page 8: Highly Available Graphite

Nark

Nark is an alerting and dashboard frontend for

Graphite.

Under active development by Lucid Software.

Written in Scala using the Play! Framework

MySQL backed

https://github.com/lucidchart/nark

Page 9: Highly Available Graphite

Demo

Page 10: Highly Available Graphite

Data Flow Overview

Page 11: Highly Available Graphite

Data Flows IN

Applications report

different types of

metrics

StatsD aggregates

metrics

Carbon-cache gathers

and groups metrics

Whisper stores metrics

to disk

Page 12: Highly Available Graphite

Data Flows OUT

User initiates request over HTTP

Graphite-web requests information from carbon-cache

Carbon-cache reads data from disk using whisper

Graphite-web builds graph using data

Page 13: Highly Available Graphite

High Availability & Scaling

Page 14: Highly Available Graphite

StatsD - Options

We can put StatsD in 3 places:

On the reporting server

Scales as well as your reporting servers do

As available as the reporting servers are

Can’t get vital metrics like stats.production.applications.chart.users.login

On a central server

Doesn’t scale

Single point of failure

On a load-balanced set of servers

AWS ELB doesn’t listen on UDP

One stat will be aggregated in multiple places

Page 15: Highly Available Graphite

StatsD - Solution

StatsD with smart-repeater on reporting servers Accepts UDP and sends

TCP for reliability

Reduces chattiness over the wire

Allows aggregation to occur at a centralized location

As scalable and available as the application servers

Page 16: Highly Available Graphite

StatsD - Solution

AWS Elastic Load Balancer distributes traffic to ha-receivers

HA-receivers: Duplicate and transform

metrics

Deliver metrics to correct server for aggregation

Are stateless – they scale horizontally

Are highly available behind the ELB

Page 17: Highly Available Graphite

StatsD - Solution

HA-receivers pass the

data to StatsD

StatsD does the final

aggregation

Every metric has

exactly one StatsD

destination

Aggregated metrics

are sent to carbon

Page 18: Highly Available Graphite

Carbon & Whisper

Carbon and whisper direct data to disk

The daemons are stateless except for buffers

Carbon consists of multiple daemons

Carbon-relay: Direct traffic to other carbon daemons

Carbon-aggregator: A mix between carbon-relay and StatsD

Carbon-cache: Gather metrics in a buffer, and write them to disk using whisper

Whisper is called from carbon-cache, and is short-lived

Page 19: Highly Available Graphite

Carbon & Whisper

We chose to use sharding

Every server holds 1/n metrics, where n = # shards

All servers in a shard hold the same data

Syncing data requires a single rsync

A b-tree of carbon-relays is used to pick a shard

Adding new shards is as easy as adding a new node in

the b-tree of carbon-relays

Retrieving data can be done by checking one server

from every shard

Page 20: Highly Available Graphite

Carbon & Whisper

StatsD sends metrics to the root carbon-relay on localhost

Carbon-relay is setup in a binary tree to pick a shard

Every metric goes to exactly one shard

Every carbon-relay goes to either 1 shard or 2 relays

Page 21: Highly Available Graphite

Carbon & Whisper

Carbon-cache receives

the metrics from the

final relay

Metrics are written to

disk using whisper on

localhost

Carbon-cache has a

last-in-wins policy

Page 22: Highly Available Graphite

graphite-web

Graphite-web is stateless

All state is contained within carbon-cache

Reading data out from a highly available, scalable

graphite installation is the same as reading from a

single server

Use the same ELB as the ha-receiver

Page 23: Highly Available Graphite

Nark

Nark is stateless

All state is contained in MySQL and Graphite

Nark will be no more highly available than your

MySQL and Graphite installations

Use an ELB, an autoscale group, and a multi-AZ RDS

instance

Page 24: Highly Available Graphite

Recap

Page 25: Highly Available Graphite

Questions?

Feature Requests?

Thanks For Your Time

Page 26: Highly Available Graphite

Join The Team

• Building the next generation of collaborative web applications

• VC funded

• High growth rate

• Profitable

• Graduates from Harvard, MIT, Stanford

• Former Google, Amazon, Microsoft employees

https://www.golucid.co/jobs