Upload
devopsdays
View
1.149
Download
1
Embed Size (px)
Citation preview
Making Operations Visible
Nick Galbreath ニック ガルブレス
DevOpsDays Tokyo 2013
http://slidesha.re/1h9Aqyehttp://www.client9.com/
It's also on video!
http://bit.ly/1gaEmDS
Nick Galbreath http://client9.com/20130501 @ngalbreath
Who is nickg?
www.client9.com
Online Advertising Infrastructureオンライン広告 インフラ
http://www.iponweb.jp/
ロシヤ モスクワ東京
Continuous Deployment
• In 2012, I spoke many times on continuous deployment.
• But changing from release cycles to continuous deployment is too big a change for most organization, and they don't have the tools to do it.
Goal
• I'm hoping that adding new metrics to the application becomes so addictive that you'll want to shorten release cycles.
What is DevOps?
• Puppet, Chef, Annsible?
• GitHub? AWS? The Cloud?
• Continuous Deployment?
Yes, but these are tools. Great tools.
It's About Communication
• Between machines
• Between team members
• Between Dev and Ops
But in many companies there is a bigger problem
You're Invisible• If you are in Business, you are invisible to Development and Tech Operations
• If you are in Operations, you are invisible to Business and Development
• If you are in Development, you are invisible to Business and Operations.
Invisible ThingsAren't Valued
Developer
• "I don't know what my code will do in production and ops and let's them deal with it.
• "Why doesn't ops fix these problems."
• "What does Ops do all day?"
Business
• Why do I have to wait till end of the month for a report?
• "Did the last weeks release change anything?"
• "What don't they understand the impact of that bug, outage, etc?"
Operations
• Why are they always bothering me.
• I've got work to do!
• Why do we have do another release again... can't developers do a better job?
• "What does this company do?" (really)
This is really destructive
To youTo your TeamTo your company.
All of This Can Fixed By Making
Operations Visible with data
Not just technical operations but company operations.
So Why Not Expose This Data?
Here's a list of excuses I've heard
Your company is full of data!
"But I already have graphing in my alerting system"
• Maybe. But it's junk
• Can't share
• Can't do data mash-ups
• Can't do data transformations
"They wouldn't understand."
• "They won't understand the data so what's the point of sharing it."
• First, "they" probably do. And more people looking at ops metrics, the better.
• Us vs. Them = Fail.
"They might break something."
• "The data is in our alerting system, we don't want you to break it."
• Assumes "they" are incompetent, or malicious. Learn to trust.
"It's not your job, so you don't need to
know.""That information isn't
important"• This excuse is typically caused by fear.
• Why are you deciding what's important?
"I'm not making another system,
duplicating data is bad."
• For operational metrics is very ok to have a redundant copy of data.
• Completely different goals.
• Use as alerting-beta
"I'm too busy.""It's too dangerous""I don't know how."
• These are real problems.
• So let's fix it!
One Machine, One Day,
One PersonChallenge!
Let's get 100% of operational metrics in, and enable the application to make and share new metrics on demand without any help from you.
Graphite•https://github.com/graphite-project
• http://graphite.readthedocs.org/
• Similar to RRDTool, Ganglia, Cacti
• Uses specialized data storage
• Uses specialized queries
• Optimized for time series
Graphite isn't Perfect
• Documentation isn't great (but getting better)
• A few QA issues
• Somewhat odd stack (python-twisted, django)
Graphite Ecosystem
• Flexible input and output
• REST API for graphs
• Simple UI for mashups and dashboards
• 3rd party, custom, client-side dashboards
Makes Sharing Easy
• Do you have an interesting graph? It's just a URL!
• Dashboards are easy since graphs are just URLs. Very easy to make HTML dashboards.
One MachineOne Day!
• A single low-end machine should have capacity for a few thousand metrics per minute from 50+ machines.
• Graphite is not CPU intensive, but needs fast disks and/or more memory.
One Day, One Person
• Graphite is not hard to install, but it is a bit messy.
• But might be as easy as "apt-get install graphite" on your system.
• It would be good to have a workshop or prebuilt AMI for EC2
• But not today :-(
Operational Stats
• You could parse /proc, ps, df, netstat, etc and write your own custom scripts....
• ...or use Diamond from BrightCove
•https://github.com/BrightcoveOS/Diamond
Metrics in Diamond now
• Apache
• NGINX
• MySQL
• SNMP
• Memory
• CPU
• Disk
• Networkand many more
But what about the your applications?
And business metrics?
100% of pure operational metrics are now shared!
Enter StatsD•https://github.com/etsy/statsd
• Your application sends event data to statsd, as it happens, in real-time.
• StatsD collects this data and computes time-series metrics (sum, min, max, average)
• Once a minute, it writes data to Graphite
The Magic of UDP
• Your application sends metrics in a UDP packet.
• UDP is error-free. No exceptions, No timeouts. It can not cause your application to crash
• It will not overload your network.
• You may lose metrics, but in an intranet, it's rare.
Let's Count Logins!
• Most StatsD client APIs are one-file, no C, simple.
• Add one line to your login code.
StatsD::increment('logins');
• That's it!
Events!• You can also graph low-frequency
events.
• Just send another StatsD request in your batch scriptStatsD::increment("deploy", 1);
• Do it on reboots, installs, core dumps.
• New bugs, new hires, new code commits.
• Use drawAsInfinite to display
Server Server Server
StatsD
Graphite
login,1login,1 login,1
(login,3), (deploy,1)Deploy Script
deploy,1
Measure Anything, Measure Everything http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
Logins By Country!
• get country code from IP address
• make a new metric "login_country" instantly
StatsD::increment('logins');$kuni = geoip2country($ipv4);StatsD::increment('logins.$kuni');
Make Dashboards
• and make frameworks to make new dashboards, easy.
Default DashboardGood for experiments
Dashboards
Make it easy for your customers
Make Operations
Visible
• Make the company visible.
• Enable communication
• Do the One Machine, One Day, One Person Challenge!
Thanks!
• The entire event is http://vimeo.com/album/2559722
DevOpsDays Tokyo 2013
DevOpsDays Tokyo 2013is on video!
http://vimeo.com/album/2559722
DevOpsDays Tokyo 2013
• http://itpro.nikkeibp.co.jp/article/NEWS/20130930/507682/
• http://itpro.nikkeibp.co.jp/article/NEWS/20130930/507755/
• http://itpro.nikkeibp.co.jp/article/NEWS/20131001/507959/
• http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013.html
• http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013_1.html
• http://www.publickey1.jp/blog/13/githubdevopsboxenhubotdevops_day_tokyo_2013.html
• http://www.publickey1.jp/blog/13/githubboxenhubotdevops_day_tokyo_2013.html
• http://www.publickey1.jp/blog/13/devopsdevops_day_tokyo_2013_2.html
Media Coverage
• http://mass.hatenablog.com/entry/2013/09/28/205309
• http://d.hatena.ne.jp/n-sega/20130928/1380373634
• http://kazuph.hateblo.jp/entry/2013/09/28/152302
• http://jedipunkz.github.io/blog/2013/09/29/devops-day-tokyo-2013-report/
• http://toshi-miura.hatenablog.com/entry/2013/09/29/222609
• http://lewuathe.github.io/blog/2013/09/28/devopsday-tokyo-2013nixing-tutekitayo/
• http://codezine.jp/article/detail/7438
DevOpsDays Tokyo 2013
Attendee Coverage