Upload
geoffrey-anderson
View
6.288
Download
1
Tags:
Embed Size (px)
DESCRIPTION
A monitoring system is arguably the most crucial system to have in place when administering and tweaking the performance of any database system. DBAs also find themselves with a variety of monitoring systems and plugins to use; ranging from small scripts in cron to complex data collection systems. In this talk, I’ll discuss how Box made a shift from the Cacti monitoring system and other various shell scripts to OpenTSDB and the changes made to our servers and daily interaction with monitoring to increase our agility in identifying and addressing changes in database behavior.
Citation preview
Monitoring MySQL with OpenTSDBPercona live 2013 Geoffrey Anderson, Box Inc.
@geodbz
WhoGeoffrey Anderson• Database Operations Engineer @ Box, Inc.
• a.k.a. DBA• Tooling for MySQL and HBase• #DBHangOps
TheSituation
ThenYouGetMoreServers
Enter OpenTSDB
OpenTSDB is...
• Distributed• Scalable• Time Series Database• Runs on HBase• Created By
Benoit Sigoure
HBase
TSD for Querying
mydb.example.com
HAProxy
fe1.example.com
TSD for Storing
Push Metrics
Query via API
• FAST• EASY to Scale• EASY to Populate
• EASY to collect data• EASY to Query
Why OpenTSDB?
Collecting Data
#!/usr/bin/env bashtimestamp=$(date +%s) mysql -ss -e "SHOW GLOBAL STATUS" | while read var valdo echo "mysql.$var $timestamp $val host=$HOSTNAME"done
[email protected]:~$ _./mysql_collector.shmysql.Aborted_connects 1366399993 0 host=mydb.example.commysql.Binlog_cache_disk_use 1366399993 0 host=mydb.example.commysql.Binlog_cache_use 1366399993 0 host=mydb.example.commysql.Binlog_stmt_cache_disk_use 1366399993 0 host=mydb.example.commysql.Binlog_stmt_cache_use 1366399993 0 host=mydb.example.commysql.Bytes_received 1366399993 19453687 host=mydb.example.commysql.Bytes_sent 1366399993 1238166682 host=mydb.example.commysql.Com_admin_commands 1366399993 1 host=mydb.example.commysql.Com_assign_to_keycache 1366399993 0 host=mydb.example.com...
Example: mysql_collector.sh
#!/usr/bin/env bashtimestamp=$(date +%s) mysql -ss -e "SHOW GLOBAL STATUS" | while read var valdo echo "mysql.$var $timestamp $val host=$HOSTNAME"done
[email protected]:~$ _./mysql_collector.shmysql.Aborted_connects 1366399993 0 host=mydb.example.commysql.Binlog_cache_disk_use 1366399993 0 host=mydb.example.commysql.Binlog_cache_use 1366399993 0 host=mydb.example.commysql.Binlog_stmt_cache_disk_use 1366399993 0 host=mydb.example.commysql.Binlog_stmt_cache_use 1366399993 0 host=mydb.example.commysql.Bytes_received 1366399993 19453687 host=mydb.example.commysql.Bytes_sent 1366399993 1238166682 host=mydb.example.commysql.Com_admin_commands 1366399993 1 host=mydb.example.commysql.Com_assign_to_keycache 1366399993 0 host=mydb.example.com...
Example: mysql_collector.sh
Metric name Timestamp Value “Tags” (key=val)
* * * * * mysql_collector.sh | nc opentsdb.example.com 4242
Example: adding a cron for OpenTSDB
[email protected]:tcollector$ tree.|-- collectors| |-- 0| | |-- ifstat.py| | |-- iostat.py| | |-- procnettcp.py| | |-- procstats.py| |-- 15| | `-- dfstat.py| |-- 30| | |-- mysql_collector.sh| |-- 300 | | `-- ptTcpModel.sh| `-- etc | |-- config.py|-- config|-- startstop`-- tcollector.py
Run forever
Run every 15 seconds
Run every 5 minutes
Run every 30 seconds
QueryingData
http://opentsdb.example.com/#start=2013/04/10-07:32:29&end=2013/04/10-07:57:57&m=sum:proc.stat.cpu.percentage_idle{host=db22}&o=axis x1y1&m=sum:db.threads_running{host=db22}&o=axis x1y2&ylabel=CPU idle&y2label=Threads Running&yrange=[0:]&wxh=1475x600&png
http://opentsdb.example.com/q?start=2013/04/10-07:32:29&end=2013/04/10-07:57:57&m=sum:proc.stat.cpu.percentage_idle{host=db22}&o=axis x1y1&m=sum:db.threads_running{host=db22}&o=axis x1y2&ylabel=CPU idle&y2label=Threads Running&yrange=[0:]&ascii
Leveraging OpenTSDB For MySQL
user_statistics monitoring
table_statistics monitoring
Table Info from I_S
SELECT *, DATA_LENGTH+INDEX_LENGTH AS TOTAL_LENGTH FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA NOT IN ('PERFORMANCE_SCHEMA','INFORMATION_SCHEMA')
Query Throughput
And other “common” metrics
• Various MySQL status counters• QPS (questions)• Threads connected• Temporary tables on disk• Etc.
• Various server statistics• %CPU Idle• Free disk space• I/O utilization• Network traffic• Etc.
Future collectors
• pt-query-digest/mysqlslow query statistics• Data from “show engine innodb status”• (that is missing from counters)
• PERFORMANCE_SCHEMA (MySQL 5.6+)• Query statistics• Processlist information• Background thread information
How does this change things?
In all seriousness, though...
• Easily see aggregate graphs• Easily build graphs on-the-fly• Full granularity forever• API request for raw data• Cluster-wide nagios checks with check_tsd
Challenges Switching• Aggregates are the default• Mouse-zooming (patched!)• Auto-suggest for metrics• “The graphs aren’t pretty”• Migrating from proof of concept
• Plan for 3+ machines• Data pruning may be required
Some QuickNumbers OpenTSDB @ Box
21,294 metrics 72 tag keys 5,145,745 tag values 90% Interactive graphs
return <300ms
Next Steps
Enjoy #PerconaLive 2013We’re hiring!
https://www.box.com/about-us/careers/[email protected]
Image credits
http://upload.wikimedia.org/wikipedia/commons/7/7b/Batelco_Network_Operations_Centre_(NOC).JPG http://www.flickr.com/photos/hoyvinmayvin/5873697252/ http://www.percona.com/doc/percona-monitoring-plugins http://www.2cto.com/uploadfile/2012/0731/20120731112415744.jpg http://media.tumblr.com/tumblr_lvfspoenWU1qi19a2.png http://img.izismile.com/img/img4/20110527/640/you_can_be_a_superhero_640_01.jpg http://openclipart.org/image/250px/svg_to_png/26427/Anonymous_notebook.png http://images.alphacoders.com/768/2560-1600-76893.jpg http://www.flickr.com/photos/in365/4861180503/ http://openclipart.org/image/250px/svg_to_png/130915/Prohibido_3D.png http://www.flickr.com/photos/61114149@N02/5566484951/ http://opentsdb.net/img/tsd-sample.png http://images2.wikia.nocookie.net/__cb20080911160202/bttf/images/5/57/WhatdidItellyou-HQ.jpg http://www.flickr.com/photos/lisakayaks/3028350539/ http://www.flickr.com/photos/25566302@N00/1472400115 http://www.flickr.com/photos/grandmaitre/5846058698/ http://www.flickr.com/photos/7518432@N06/2673347604/