43
collectd & PostgreSQL Mark Wong [email protected] [email protected] PDXPUG November 17, 2011

collectd & PostgreSQL

Embed Size (px)

DESCRIPTION

This presentation is primarily focused on how to use collectd (http://collectd.org/) to gather data from the Postgres statistics tables. Examples of how to use collectd with Postgres will be shown. There is some hackery involved to make collectd do a little more and collect more meaningful data from Postgres. These small patches will be explored. A small portion of the discussion will be about how to visualize the data.

Citation preview

Page 1: collectd & PostgreSQL

collectd & PostgreSQL

Mark [email protected]@myemma.com

PDXPUG

November 17, 2011

Page 2: collectd & PostgreSQL

My Story

• How did I get to collectd?

• What is collectd

• Hacking collectd

• Using collectd with Postgres

• Visualizing the data

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 2 / 43

Page 3: collectd & PostgreSQL

Brief background

• Working at a little company called Emma http://myemma.com

• Collect performance data from production systems

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 3 / 43

Page 4: collectd & PostgreSQL

What did we have?

• A database with over 1 million database objects• >500,000 tables• >1,000,000 indexes

• Tables alone generate 11,000,000 data point per sample

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 4 / 43

Page 5: collectd & PostgreSQL

What did we try?

Only free things:

• Cacti http://www.cacti.net/

• Ganglia http://ganglia.info/

• Munin http://munin-monitoring.org/

• Reconnoiter https://labs.omniti.com/labs/reconnoiter

• Zenoss http://community.zenoss.org/

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 5 / 43

Page 6: collectd & PostgreSQL

What doesn’t work

Dependency on RRDtool; can’t handle more than hundreds of thousands ofmetrics (Application Buffer-Cache Management for Performance: Running theWorld’s Largest MRTG by David Plonka, Archit Gupta and Dale Carder, LISA2007):

• Cacti

• Ganglia

• Munin

• Reconnoiter

• Zenoss

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 6 / 43

Page 7: collectd & PostgreSQL

Reconnoiter almost worked for us

Pro’s:

• Write your own SQL queries to collect data from Postgres

• Used Postgres instead of RRDtool for storing data

• JavaScript based on-the-fly charting

• Support for integrating many other health and stats collection solutions

Con’s:

• Data collection still couldn’t keep up; maybe needed more tuning

• Faster hardware? (using VM’s)

• More hardware? (scale out MQ processes)

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 7 / 43

Page 8: collectd & PostgreSQL

Couldn’t bring myself to try anything else

• Hands were tied, no resources available to help move forward.

• Can we build something light weight?

• Played with collectd (http://collectd.org/) while evaluatingReconnoiter

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 8 / 43

Page 9: collectd & PostgreSQL

What is collectd?

collectd is a daemon which collects system performancestatistics periodically and provides mechanisms to store thevalues in a variety of ways, for example in RRD files.

http://collectd.org/

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 9 / 43

Page 10: collectd & PostgreSQL

Does this look familiar?

Note: RRDtool is an option, not a requirementmarkwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 10 / 43

Page 11: collectd & PostgreSQL

What is special about collectd?

From their web site:

• it’s written in C for performance and portability• includes optimizations and features to handle hundreds

of thousands of data sets

• PostgreSQL plugin enables querying the database

• Can collect most operating systems statistics (I say “most” because Idon’t know if anything is missing)

• Over 90 total pluginshttp://collectd.org/wiki/index.php/Table_of_Plugins

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 11 / 43

Page 12: collectd & PostgreSQL

collectd data description

• time - when the data was collected

• interval - frequency of data collection

• host - server hostname

• plugin - collectd plugin used

• plugin instance - additional plugin information

• type - type of data collected for set of values

• type instance - unique identifier of the metric

• dsnames - names for the values collected

• dstypes - type of data for values collected (e.g. counter, gauge, etc.)

• values - array of values collected

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 12 / 43

Page 13: collectd & PostgreSQL

PostgreSQL plugin configuration

Define custom queries in collectd.conf:

LoadPlugin postgresql

<Plugin postgresql>

<Query magic>

Statement "SELECT magic FROM wizard;"

<Result>

Type gauge

InstancePrefix "magic"

ValuesFrom magic

</Result>

</Query>

...

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 13 / 43

Page 14: collectd & PostgreSQL

. . . per database.

...

<Database bar>

Interval 60

Service "service_name"

Query backend # predefined

Query magic_tickets

</Database>

</Plugin>

Full details athttp://collectd.org/wiki/index.php/Plugin:PostgreSQL

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 14 / 43

Page 15: collectd & PostgreSQL

Hurdles

More meta data:

• Need a way to save schema, table, and index names; can’t differentiatestats between tables and indexes

• Basic support of meta data in collectd but mostly unused

• How to store data in something other than RRDtool

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 15 / 43

Page 16: collectd & PostgreSQL

Wanted: additional meta data

Hack the PostgreSQL plugin to create meta data for:

• database - database name (maybe not needed, same asplugin instance)

• schemaname - schema name

• tablename - table name

• indexname - index name

• metric - e.g. blks hit, blks read, seq scan, etc.

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 16 / 43

Page 17: collectd & PostgreSQL

Another database query for collecting a table statistic

<Query table_stats>

SELECT schemaname, relname, seq_scan

FROM pg_stat_all_tables;

<\Query>

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 17 / 43

Page 18: collectd & PostgreSQL

Identify the data

<Result>

Type counter

InstancePrefix "seq_scan"

InstancesFrom "schemaname" "relname"

ValuesFrom "seq_scan"

</Result>

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 18 / 43

Page 19: collectd & PostgreSQL

Meta data specific parameters

<Database postgres>

Host "localhost"

Query table_stats

SchemanameColumn 0

TablenameColumn 1

</Database>

Note: The database name is set by what is specified in the <Database>tag, ifit is not retrieved by the query.

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 19 / 43

Page 20: collectd & PostgreSQL

Example data

• time: 2011-10-20 18:04:17-05

• interval: 300

• host: pong.int

• plugin: postgresql

• plugin instance: sandbox

• type: counter

• type instance: seq scan-pg catalog-pg class

• dsnames: {value}• dstypes: {counter}• values: {249873}

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 20 / 43

Page 21: collectd & PostgreSQL

Example meta data

• database: sandbox

• schemaname: pg catalog

• tablename: pg class

• indexname:

• metric: seq scan

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 21 / 43

Page 22: collectd & PostgreSQL

Now what?

Hand’s were tied (I think I mentioned that earlier); open sourced work to date:

• collectd forked with patcheshttps://github.com/mwongatemma/collectd

• YAMS https://github.com/myemma/yams

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 22 / 43

Page 23: collectd & PostgreSQL

Yet Another Monitoring System

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 23 / 43

Page 24: collectd & PostgreSQL

Switching hats and boosting code

Using extracurricular time working on equipment donated to Postgres fromSUN, IBM, and HP to continue proofing collectd changes.

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 24 / 43

Page 25: collectd & PostgreSQL

How am I going to move the data?

Options from available write plugins; guess which I used:• Carbon - Graphite’s storage API to Whisper

http://collectd.org/wiki/index.php/Plugin:Carbon

• CSV http://collectd.org/wiki/index.php/Plugin:CSV

• Network - Send/Receive to other collectd daemonshttp://collectd.org/wiki/index.php/Plugin:Network

• RRDCacheD http://collectd.org/wiki/index.php/Plugin:RRDCacheD

• RRDtool http://collectd.org/wiki/index.php/Plugin:RRDtool

• SysLog http://collectd.org/wiki/index.php/Plugin:SysLog

• UnixSock http://collectd.org/wiki/index.php/Plugin:UnixSock

• Write HTTP - PUTVAL (plain text), JSONhttp://collectd.org/wiki/index.php/Plugin:Write_HTTP

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 25 / 43

Page 26: collectd & PostgreSQL

Process of elimination

If RRDtool (wriiten in C) can’t handle massive volumes of data, a PythonRRD like database probably can’t either:

• Carbon

• CSV

• Network

• RRDCacheD

• RRDtool

• SysLog

• UnixSock

• Write HTTP - PUTVAL (plain text), JSON

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 26 / 43

Page 27: collectd & PostgreSQL

Process of elimination

Writing to other collectd daemons or just locally doesn’t seem useful at themoment:

• CSV

• Network

• SysLog

• UnixSock

• Write HTTP - PUTVAL (plain text), JSON

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 27 / 43

Page 28: collectd & PostgreSQL

Process of elimination

Let’s try CouchDB’s RESTful JSON API!

• CSV

• SysLog

• Write HTTP - PUTVAL (plain text), JSON

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 28 / 43

Page 29: collectd & PostgreSQL

Random: What Write HTTP PUTVAL data looks like

Note: Each PUTVAL is a single line but is broken up into two lines to fit ontothe slide.

PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_octets

interval=10 1251533299:197141504:175136768

PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_ops

interval=10 1251533299:10765:12858

PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_time

interval=10 1251533299:5:140

PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_merged

interval=10 1251533299:4658:29899

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 29 / 43

Page 30: collectd & PostgreSQL

Random: What the Write HTTP JSON data looks like

Note: Write HTTP packs as much data as it can into a 4KB buffer.

[ {

"values": [197141504, 175136768],

"dstypes": ["counter", "counter"],

"dsnames": ["read", "write"],

"time": 1251533299,

"interval": 10,

"host": "leeloo.lan.home.verplant.org",

"plugin": "disk",

"plugin_instance": "sda",

"type": "disk_octets",

"type_instance": ""

}, ... ]

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 30 / 43

Page 31: collectd & PostgreSQL

I didn’t know anything about CouchDB at the time

• Query interface not really suited for retrieving data to visualize

• Insert performance not suited for millions of metrics of data over shortintervals (can insert same data into Postgres several orders ofmagnitude faster)

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 31 / 43

Page 32: collectd & PostgreSQL

Now where am I going to put the data?

Hoping that using the Write HTTP is still a good choice:• Write an ETL

• Table partitioning logic; creation of partition tables• Transform JSON data into INSERT statements

• Use Postgres

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 32 / 43

Page 33: collectd & PostgreSQL

Database design

Table "collectd.value_list"

Column | Type | Modifiers

-----------------+--------------------------+-----------

time | timestamp with time zone | not null

interval | integer | not null

host | character varying(64) | not null

plugin | character varying(64) | not null

plugin_instance | character varying(64) |

type | character varying(64) | not null

type_instance | character varying(64) |

dsnames | character varying(512)[] | not null

dstypes | character varying(8)[] | not null

values | numeric[] | not null

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 33 / 43

Page 34: collectd & PostgreSQL

Take advantage of partitioning

At least table inheritance in Postgres’ case; partition data by plugin

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 34 / 43

Page 35: collectd & PostgreSQL

Child table

Table "collectd.vl_postgresql"

Column | Type | Modifiers

-----------------+--------------------------+-----------

...

database | character varying(64) | not null

schemaname | character varying(64) |

tablename | character varying(64) |

indexname | character varying(64) |

metric | character varying(64) | not null

Check constraints:

"vl_postgresql_plugin_check" CHECK (plugin::text =

’postgresql’::text)

Inherits: value_list

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 35 / 43

Page 36: collectd & PostgreSQL

How much partitioning?

Lots of straightforward options:

• Date

• Database

• Schema

• Table

• Index

• Metric

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 36 / 43

Page 37: collectd & PostgreSQL

Back to the ETL

Parameters set for fastest path to working prototype:

• Keeping using HTTP POST (Write HTTP plugin) for HTTP protocoland JSON

• Use Python for built in HTTP Server and JSON parsing (Emma isprimarily a Python shop)

• Use SQLAlchemy/psycopg2

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 37 / 43

Page 38: collectd & PostgreSQL

Back again to the ETL

Python didn’t perform; combination of JSON parsing, data transformation,and INSERT performance still several orders of magnitude below acceptablelevels:

• redis to queue data to transform

• lighttpd for the HTTP interface

• fastcgi C program to push things to redis• multi-threaded C program using libpq for Postgres API

• pop data out of redis• table partitioning creation logic• transform JSON data into INSERT statements

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 38 / 43

Page 39: collectd & PostgreSQL

Success?

• Table statistics for 1 million tables collect in approximately 12 minutes.

• Is that acceptable?

• Can we go faster?

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 39 / 43

Page 40: collectd & PostgreSQL

If you don’t have millions of data

Easier ways to visualize the data:

• RRDtool

• RRDtool compatible front-endshttp://collectd.org/wiki/index.php/List_of_front-ends

• Graphite with the Carbon and Whisper combohttp://graphite.wikidot.com/

• Reconnoiter

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 40 / 43

Page 41: collectd & PostgreSQL

__ __

/ \~~~/ \ . o O ( Thank you! )

,----( oo )

/ \__ __/

/| (\ |(

^ \ /___\ /\ |

|__| |__|-"

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 41 / 43

Page 42: collectd & PostgreSQL

Acknowledgements

Hayley Jane Wakenshaw

__ __

/ \~~~/ \

,----( oo )

/ \__ __/

/| (\ |(

^ \ /___\ /\ |

|__| |__|-"

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 42 / 43

Page 43: collectd & PostgreSQL

License

This work is licensed under a Creative Commons Attribution 3.0 UnportedLicense. To view a copy of this license, (a) visithttp://creativecommons.org/licenses/by/3.0/us/; or, (b) send aletter to Creative Commons, 171 2nd Street, Suite 300, San Francisco,California, 94105, USA.

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 43 / 43