Upload
matthew-brender
View
286
Download
0
Embed Size (px)
Citation preview
LinuxCon 2016An introduction to datacentertelemetry using open source tools
Matt Brender (@mjbrender)
Briefly, About Me
Am:@mjbrender (everywhere)
Developer Advocate,Orchestration Engineering
Pretty good at Open Source practices
Was:Storage array performanceVMware NoSQL
Loose Agenda
1. Wishful thinking of the lab config
2. What is telemetry
3. One opinion on the state of open source tooling
Let’s Test the Network
4
linuxcon.snap-telemetry.io
then
git clone
I encourage you to keep downloading stuff until you’re ready to go.
Lab Hopes
5
High Level View
6
Grafana+
InfluxDBSnap Snap
“Admin” ”Production”
Less High Level View
7
Your Laptop
Ubuntu 16.04Vagr
ant
Ubuntu 16.04Ubuntu 16.04
Less High Level View
8
Your Laptop
Ubuntu 16.04Vagr
ant
Ans
ible
Ubuntu 16.04Ubuntu 16.04
SnapDocker Snap
Less High Level View
9
Your Laptop
Ubuntu 16.04Vagr
ant
Ans
ible
Ubuntu 16.04Ubuntu 16.04
SnapDocker Snap
Com
pose
InfluxDB Grafana
Why???
10
11
Telemetry
12
SnapcollectdStatsDtelegrafbeats Logstashdiamond
InfluxDBOpenTSDBKairosDBGraphitePrometheusElasticSearchBosun
GrafanaSensuGangliaRRDtoolNagiosFacetteVector (Netflix)
13
what my friends think telemetry is what my parents think telemetry is what society thinks telemetry is
what my boss thinks telemetry is what I think telemetry is what telemetry actually is
What Is Telemetry?Telemetry is the stuff you can measure and the process of capturing it: from the heat generated on a CPU core to the throughput of Nginx* running in a Docker* container on a Kubernetes cluster. It’s all measurable and it’s all summarized in that one word.
• Telemetry - the process of using equipment to take measurements of something and send them to another place
• Metrics - measurements of facts throughout the data center
• Analytics - the method of logical analysis that determines the consequences of information
What Is Telemetry?
What How
Application Availability ping
Operating System Performance
psutil
Hardware UtilizationIntel Performance Counter Metrics (PCM)
What Is Telemetry?
What How Why
Application Availability ping SLA compliance
Operating System Performance
psutil System performance
Hardware UtilizationIntel Performance Counter Metrics (PCM)
Scaling capacity
What snap is and what it isn’t
17
Telemetry Analytics
What snap is and what it isn’t
18
Telemetry Analytics
snap
snap is a framework for metrics.
snap is NOT an analytics alternative.
What snap is and what it isn’t
19
Telemetry Analytics
Automation
Scheduling
IRO
collect process publish
The Watcher Workflow
20
21
Collectors in snap
Processors in snap
22
Publishers in snap
23
24
Collectors in snap
Collect telemetry data once via plugins for:§ Bare metal, including Intel specific platform metrics
(CPU, NIC, BMC, SMARTS)§ Operating Environments and existing telemetry
(Docker, libvirt, psutil)§ Application services and adjacencies
(Ceph, HAProxy, Etcd, Facter, MySQL, Apache)
Populate a dynamically generated single-namespace telemetry catalog
25
Filter, alter or append metadata via plugins for:§ Filtering (Moving Averages)§ Normalization § Encryption for all or part of the data set§ Injection of metadata
§ Tokens§ Tenant IDs
Forking to one or more endpoints
Processors in snap
26
Publish data via plugins for:§ Dashboard Tools
(Graphite, Grafana, Riemann)§ Queues and Logs
(RabbitMQ, Kafka, File)§ Databases
(PostgreSQL, InfluxDB, OpenTSDB, SAP HANA)
To one or more endpoints
Publishers in snap
Visibility at all layers
27
App
OS
HW
?
?
?
?
Analytics Pipeline
Dashboards
Visibility at all layers
28
?
App
OS
HWAnalytics Pipeline
Dashboards
Visibility at all layers
29
Snap
App
OS
HWAnalytics Pipeline
Dashboards
Visibility at all layers
30
OS
HWAnalytics Pipeline
Dashboards
App
OS
Virtualization
HW
App
Snap
Visibility at all layers
31
OS
HWAnalytics Pipeline
Dashboards
App
OS
Virtualization
HW
App
Snap
Visibility at all layers
32
OS
HWAnalytics Pipeline
Dashboards
App
OS
HW
App
Snap
Kubernetes
Visibility at all layers
OS
HW
App
Snap
Kubernetes
OS
HW
App
OS
HW
App
OS
HW
App
OS
HW
App
OS
HW
App
OS
HW
App
34
REST & CLI Flexible Scheduling Caching Security
Plugin Lifecycle Management Worker Queues Metric Catalog Tribe
Thought Leadership Ahead
35
Warning:
Monitoring is
36
Monitoring
37
Monitoring
TelemetryAlerts
Persistence
Learning
Visualization
LoggingNotifications
Monitoring is
38
Monitoring is
Telemetry
39
Monitoring is
TelemetryCollect
Process
Publish
Schedule
Automate
40
Monitoring
TelemetryAlerts
Persistence
Learning
Visualization
LoggingNotifications
Monitoring is
41
Monitoring
TelemetryAlerts
Persistence
Learning
Visualization
LoggingNotifications
Monitoring is
Snap
42
Monitoring
TelemetryAlerts
Persistence
Learning
Visualization
LoggingNotifications
Monitoring is
Grafana
Better Thought Leadership
43
by @obscurify by @caskey
https://github.com/mjbrender/what-we-talk-about-when-we-talk-about-telemetry
Q&A
44
FAQ
45
Do I need telemetry?
FAQ
46
I don’t need telemetry, I have ____________.
FAQ
47
I don’t need telemetry, I have ____________.Graphite
48
Monitoring
TelemetryAlerts
Persistence
Learning
Visualization
LoggingNotifications
Monitoring is
Graphite
FAQ
49
Do I need monitoring?
FAQ
50
We run ________ for monitoring.Nagios
51
Monitoring
TelemetryAlerts
Persistence
Learning
Visualization
LoggingNotifications
Monitoring is
Nagios
What Is Telemetry? (revisited)
What How
Application Availability ping
Operating System Performance
psutil
Hardware UtilizationIntel Performance Counter Metrics (PCM)
What Is Telemetry? (revisited)
What Query Collect Process Publish Visualize
Application Availability ping ? ? ? ?
Operating System Performance
psutil ? ? ? ?
Hardware Utilization PCM ? ? ? ?
How Expanded
What Is Telemetry? (revisited)
What Query Collect Process Publish Visualize
Application Availability ping ? ? ? ?
Operating System Performance
psutil ? ? ? ?
Hardware Utilization PCM ? ? ? ?
How Expanded
Snap Grafana
55
Next Up
56
Start using Snap!• snap-telemetry.io• github.com/intelsdi-x
Find me:• on The Geek Whisperers• and @mjbrender
additional information
57
Everything is Challenging At Scale
58
Add new task
59
Add new task
60
define as a tribe
Scaling with Tribe
61
Scaling with Tribe
Add new task
62
snap | What’s next?
Physical/Virtual Host
Scheduler
Processing
Publishing
Collection
63
snap | What’s next?
64
Physical/VM Host
Physical/VM Host
Physical/VM Host
Physical/VM Host
Physical/VM Host Physical/VM Host
Collection
Collection
Collection
Scheduler
Processing Publishing
§ Plugin load§ Dynamic, does not require restart§ Automatically is informed by plugin on the features, metrics, and configuration detail.§ Dynamically extends the metric catalog when loaded.
§ Plugin unload§ Removes metrics from catalog automatically
§ Loading a new plugin automatically upgrades running workflows in tasks
§ Optionally the collection can be pinned to a version(ex: get /intel/server/cpu/load/v1)
§ Each scheduled workflow automatically uses the most mature plugin for that step§ Coupled with dynamic plugin loading results in instantaneous updates to existing workflows
§ Helpful for bug fixes, security patching, improving accuracy
snap | Plugin Lifecycle
65
Customizable definition of task and related workflow:
CollectPublish
Publish
Collect Publish ProcessCollect Publish
Collect
Process Publish
Process Publish
snap | Overview – Example Workflows
66
The Catalog
67
Intel PCMpsutil HAProxy
/intel/psutil/load/load1
/intel/psutil/load/load5
/intel/psutil/vm/available
/intel/pcm/EXEC
/intel/pcm/FREQ
/intel/linux/docker/cpu_stats/throttling_data/periods
snapctl metric list
/intel/server/health/score
DockerIntel
Health
/intel/haproxy/info/MaxConnRate
snap