Upload
datadogslides
View
304
Download
6
Embed Size (px)
DESCRIPTION
NGINX just works and that's why we use it. That does not mean that it should be left unmonitored. As a web server, it plays a central role in a modern infrastructure. As a gatekeeper, it sees every interaction with the application. If you monitor it properly it can explain a lot about what is happening in the rest of your infrastructure. In this talk you will learn more about NGINX (plus) metrics, what they mean and how to use them. You will also learn different methods (status, statsd, logs) to monitor NGINX with their pros and cons, illustrated with real data coming from real servers.
Citation preview
Monitoring nginxAlexis Lê-Quôc, Datadog
@alq
Agenda• Dramatis personae • Observations • Monitoring 1 nginx (plus) with logs • Monitoring 1 nginx (plus) with metrics • Monitoring N nginx effectively
@alq CTO at Datadog
Datadog == monitoring• Monitoring as a service • Work really will with large, dynamic environments (e.g. clouds) • Aggregate performance metrics • Correlate nginx performance with the rest of your infrastructure
ObservationsFrom the field
Some stats• Across all monitored servers • nginx ~10% • Apache ~5% • CPU and CPU/$ is the dominant resource
% of instances per core count
0%
10%
20%
30%
40%
Core count1 2 4 8 12 16 24 32
10%
1%3%
10%
30%
7%
39%
10%
% of instances per type (AWS only)
0%
7.5%
15%
22.5%
30%
EC2 typec3.l c3.2xl c1.xl c3.8xl m3.l c3.xl m3.m cc2.8xl t2.m c3.4xl rest
8.6%
3.1%4.4%4.5%4.7%5%5.3%
7.6%
13%14%
30%
Monitoring nginx1. Monitoring with logs 2. Monitoring with status 3. Monitoring with statsd
Monitoring with logs
• Canonical example of log indexers • Your choice of:
• logstash • splunk • logentries, sumologic, loggly, etc.
nginx log forwarder indexer UI
Monitoring with logs
nginx log forwarder indexer UI
Strengths Weaknesses
forensics & anomalies low signal-to-noise ratio
content-driven analysis “black box”
Monitoring with metrics
• open-source: ngx_http_stub_status_module • bare-bone metrics • human-readable text presentation
• plus: ngx_http_status_module • a lot more metrics for each function • json format
• Your choice of… • Datadog, Nagios, Zabbix, etc. for open-source • Datadog for nginx plus
nginx status collector aggregator UI/alerts
Monitoring with metrics
nginx status collector aggregator UI/alerts
Strengths Weaknesses
lightweight & real-time no insight into content
“white box”
Simple metrics taxonomy1. What it measures
• Work or resource • Focus on work because work == value • Resource analysis useful to understand performance
• Use Brendan Gregg’s USE • Utilization (% over time) • Saturation (queue length) • Errors (count over time)
2. Type • Gauge: sample • Counter: accumulated sample, needs to be derived to be
meaningful
http://www.brendangregg.com/usemethod.html
Open-source metrics
Class Type Resource/Work Notes
Current connections Gauge Resource reading, writing,
idleAccepted
connections Counter Resource
Handled connections Counter Resource <= accepted if
resource limit
Requests Counter Work True purpose of the server
•Latency must be measured using logs or statsd.
Key “plus” metrics
Class Type Resource/Work Notes
5xx Errors Counter Work without log analysis
5xx/sum(Nxx) Gauge Work error rate %
idle/dropped connections Gauge Resource saturation
active/total connections Gauge Resource upstream
capacity
Requests Counter Work true purpose of the server
• Latency must be measured using logs or statsd.
Monitoring with statsd
nginx statsd UI/alerts
Strengths Weaknesses
lightweight, real-time, standard not comprehensive
custom metrics, content-aware
https://github.com/zebrafishlabs/nginx-statsd
Example
Monitoring nginx1. Logs for content-analysis (forensics, anomalies, marketing) 2. Status for (white box) performance monitoring 3. statsD for custom metrics
No single method gives you everything you need.
Monitoring a lot of nginx1. Requires aggregation 2. It’s all about Metadata (“Pet-to-cattle” mindset) 3. Correlation
Aggregation• By default for log-based monitoring • Not by default for metric-based monitoring
Metadata• Analyze by properties that are not the host identity • Find anomalies that are not obvious • Pet-to-cattle evolution: hosts don’t matter, services do
Correlation• nginx is only one piece of the infrastructure
#plugwww.datadog.com
Thank you!Questions/Comments? @alq