22
Large Scale OpenAFS Performance Monitoring with Graphite Sine Nomine Associates August 20, 2015 Large Scale OpenAFS Performance Monitoring with Graphite 1 of 22

Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Large Scale OpenAFS Performance Monitoringwith Graphite

Sine Nomine Associates

August 20, 2015

Large Scale OpenAFS Performance Monitoring with Graphite 1 of 22

Page 2: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Graphite

• Store and graph time-series metrics

• Clustering features for scaling

• Metric namespace

• Render API

• Python stack

Large Scale OpenAFS Performance Monitoring with Graphite 2 of 22

Page 3: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Graphite

• Apache License

• Widely deployed

• Many tools that work with graphite

• I/O intensive (one file per metric!)

Large Scale OpenAFS Performance Monitoring with Graphite 3 of 22

Page 4: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Graphite

• Carbon - daemons to accept and store metrics

• Whisper - library/file format to store time-series metrics

• Graphite - web app provides render API and basicdashboard

Large Scale OpenAFS Performance Monitoring with Graphite 4 of 22

Page 5: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Deployment Experience

• Deployed to a separate server

• Partition for whisper

• Graphite RPM install almost worked, ymmv• some django setup headaches• several small patches needed

• Configure carbon data retention policies

Large Scale OpenAFS Performance Monitoring with Graphite 5 of 22

Page 6: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Feeding AFS metrics to Graphite

Large Scale OpenAFS Performance Monitoring with Graphite 6 of 22

Page 7: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Feeding metrics to graphite

• It is very easy to feed metrics to graphite

• Lots of tools to feed system metrics

• Metrics are sent to carbon via tcp (or udp)

• One metric per line: (path, value, timestamp)

afs.example.afs01.rx.callswaited 31324 1420520400

ImportantData points should be feed to graphite at the same rate as thehighest data retention frequency!

Large Scale OpenAFS Performance Monitoring with Graphite 7 of 22

Page 8: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Gathering AFS fileserver metrics

• rxdebug: rx GetServerDebug() rx GetServerStats()

• vice stats GetStatistics64

• xstats RXAFS GetXStats()

• xstat collections• 0: not implemented on the fileserver• 1: partial performance stats (subset of 2)• 2: performance stats• 3: callback stats

• vos partition information

• audit log fids and hosts• sys-v msg queue

Large Scale OpenAFS Performance Monitoring with Graphite 8 of 22

Page 9: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Carbon Configuration

• Retention buckets• regex of paths• frequency:duration

• Aggregation methods• regex of metric• how to rollup data

ImportantUse validate-storage-schemas to check retention policies.

Large Scale OpenAFS Performance Monitoring with Graphite 9 of 22

Page 10: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Carbon Aggregation

• sum for counters (rx calls waited, packets sent)

• average for gauges (size, used percent)

ImportantIf you have more than one retention bucket, you must providethe method to aggregate data! The carbon default is average(gauge). Most afs metrics are counts.

Large Scale OpenAFS Performance Monitoring with Graphite 10 of 22

Page 11: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Metric Namespace

• Decide on metric namespace conventions early

• Changes later will break render API calls

• Dot is reserved as a separator!

Example:

afs.<cellname>.<server>.part.<part>.<metric>

Large Scale OpenAFS Performance Monitoring with Graphite 11 of 22

Page 12: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Whisper files

• Metrics are stored in regular files in a binary format.

• Makes it easy to archive the whole set

• Nice for ”offline” analysis

• Whisper tools are handy to dump header info and raw data!

whisper-dump /var/lib/carbon/whisper/afs/example/foo/bar.wsp

Large Scale OpenAFS Performance Monitoring with Graphite 12 of 22

Page 13: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Render API

• The render API is the heart of graphite!

• Rich set of functions• combine, transform, calculate, filter

• Rest API

• Json for data dumps

• Create and share a set of standard render calls

• Ad hoc exploration

Large Scale OpenAFS Performance Monitoring with Graphite 13 of 22

Page 14: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Render Example

http://graphite/render?

title=Fetched bytes per second

from=-14day

until=now

target=

aliasByNode(

highestAverage(

scaleToSeconds(

nonNegativeDerivative(

afs.example.*.stats.xfer.FetchData.bytes.sum),1)),2)

Large Scale OpenAFS Performance Monitoring with Graphite 14 of 22

Page 15: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Grafana

• Popular optional client dashboard

• Easy to deploy

• Generally nicer than the built-in graphite dashboard

• Great for ad hoc graphs and exploration

Large Scale OpenAFS Performance Monitoring with Graphite 15 of 22

Page 16: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Example

Large Scale OpenAFS Performance Monitoring with Graphite 16 of 22

Page 17: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Example

Large Scale OpenAFS Performance Monitoring with Graphite 17 of 22

Page 18: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Example

Large Scale OpenAFS Performance Monitoring with Graphite 18 of 22

Page 19: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Example

Large Scale OpenAFS Performance Monitoring with Graphite 19 of 22

Page 20: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Example

Large Scale OpenAFS Performance Monitoring with Graphite 20 of 22

Page 21: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Example

Large Scale OpenAFS Performance Monitoring with Graphite 21 of 22

Page 22: Large Scale OpenAFS Performance Monitoring with Graphiteworkshop.openafs.org/afsbpw15/talks/thursday/meffie-graphite.pdf · Graphite Apache License Widely deployed Many tools that

Questions

This slide intentionally left blank.

Large Scale OpenAFS Performance Monitoring with Graphite 22 of 22