67
LinuxCon 2016 An introduction to datacenter telemetry using open source tools Matt Brender (@mjbrender)

Intro to open source telemetry linux con 2016

Embed Size (px)

Citation preview

Page 1: Intro to open source telemetry   linux con 2016

LinuxCon 2016An introduction to datacentertelemetry using open source tools

Matt Brender (@mjbrender)

Page 2: Intro to open source telemetry   linux con 2016

Briefly, About Me

Am:@mjbrender (everywhere)

Developer Advocate,Orchestration Engineering

Pretty good at Open Source practices

Was:Storage array performanceVMware NoSQL

Page 3: Intro to open source telemetry   linux con 2016

Loose Agenda

1. Wishful thinking of the lab config

2. What is telemetry

3. One opinion on the state of open source tooling

Page 4: Intro to open source telemetry   linux con 2016

Let’s Test the Network

4

linuxcon.snap-telemetry.io

then

git clone

I encourage you to keep downloading stuff until you’re ready to go.

Page 5: Intro to open source telemetry   linux con 2016

Lab Hopes

5

Page 6: Intro to open source telemetry   linux con 2016

High Level View

6

Grafana+

InfluxDBSnap Snap

“Admin” ”Production”

Page 7: Intro to open source telemetry   linux con 2016

Less High Level View

7

Your Laptop

Ubuntu 16.04Vagr

ant

Ubuntu 16.04Ubuntu 16.04

Page 8: Intro to open source telemetry   linux con 2016

Less High Level View

8

Your Laptop

Ubuntu 16.04Vagr

ant

Ans

ible

Ubuntu 16.04Ubuntu 16.04

SnapDocker Snap

Page 9: Intro to open source telemetry   linux con 2016

Less High Level View

9

Your Laptop

Ubuntu 16.04Vagr

ant

Ans

ible

Ubuntu 16.04Ubuntu 16.04

SnapDocker Snap

Com

pose

InfluxDB Grafana

Page 10: Intro to open source telemetry   linux con 2016

Why???

10

Page 11: Intro to open source telemetry   linux con 2016

11

Telemetry

Page 12: Intro to open source telemetry   linux con 2016

12

SnapcollectdStatsDtelegrafbeats Logstashdiamond

InfluxDBOpenTSDBKairosDBGraphitePrometheusElasticSearchBosun

GrafanaSensuGangliaRRDtoolNagiosFacetteVector (Netflix)

Page 13: Intro to open source telemetry   linux con 2016

13

what my friends think telemetry is what my parents think telemetry is what society thinks telemetry is

what my boss thinks telemetry is what I think telemetry is what telemetry actually is

Page 14: Intro to open source telemetry   linux con 2016

What Is Telemetry?Telemetry is the stuff you can measure and the process of capturing it: from the heat generated on a CPU core to the throughput of Nginx* running in a Docker* container on a Kubernetes cluster. It’s all measurable and it’s all summarized in that one word.

• Telemetry - the process of using equipment to take measurements of something and send them to another place

• Metrics - measurements of facts throughout the data center

• Analytics - the method of logical analysis that determines the consequences of information

Page 15: Intro to open source telemetry   linux con 2016

What Is Telemetry?

What How

Application Availability ping

Operating System Performance

psutil

Hardware UtilizationIntel Performance Counter Metrics (PCM)

Page 16: Intro to open source telemetry   linux con 2016

What Is Telemetry?

What How Why

Application Availability ping SLA compliance

Operating System Performance

psutil System performance

Hardware UtilizationIntel Performance Counter Metrics (PCM)

Scaling capacity

Page 17: Intro to open source telemetry   linux con 2016

What snap is and what it isn’t

17

Telemetry Analytics

Page 18: Intro to open source telemetry   linux con 2016

What snap is and what it isn’t

18

Telemetry Analytics

snap

snap is a framework for metrics.

snap is NOT an analytics alternative.

Page 19: Intro to open source telemetry   linux con 2016

What snap is and what it isn’t

19

Telemetry Analytics

Automation

Scheduling

IRO

Page 20: Intro to open source telemetry   linux con 2016

collect process publish

The Watcher Workflow

20

Page 21: Intro to open source telemetry   linux con 2016

21

Collectors in snap

Page 22: Intro to open source telemetry   linux con 2016

Processors in snap

22

Page 23: Intro to open source telemetry   linux con 2016

Publishers in snap

23

Page 24: Intro to open source telemetry   linux con 2016

24

Collectors in snap

Collect telemetry data once via plugins for:§ Bare metal, including Intel specific platform metrics

(CPU, NIC, BMC, SMARTS)§ Operating Environments and existing telemetry

(Docker, libvirt, psutil)§ Application services and adjacencies

(Ceph, HAProxy, Etcd, Facter, MySQL, Apache)

Populate a dynamically generated single-namespace telemetry catalog

Page 25: Intro to open source telemetry   linux con 2016

25

Filter, alter or append metadata via plugins for:§ Filtering (Moving Averages)§ Normalization § Encryption for all or part of the data set§ Injection of metadata

§ Tokens§ Tenant IDs

Forking to one or more endpoints

Processors in snap

Page 26: Intro to open source telemetry   linux con 2016

26

Publish data via plugins for:§ Dashboard Tools

(Graphite, Grafana, Riemann)§ Queues and Logs

(RabbitMQ, Kafka, File)§ Databases

(PostgreSQL, InfluxDB, OpenTSDB, SAP HANA)

To one or more endpoints

Publishers in snap

Page 27: Intro to open source telemetry   linux con 2016

Visibility at all layers

27

App

OS

HW

?

?

?

?

Analytics Pipeline

Dashboards

Page 28: Intro to open source telemetry   linux con 2016

Visibility at all layers

28

?

App

OS

HWAnalytics Pipeline

Dashboards

Page 29: Intro to open source telemetry   linux con 2016

Visibility at all layers

29

Snap

App

OS

HWAnalytics Pipeline

Dashboards

Page 30: Intro to open source telemetry   linux con 2016

Visibility at all layers

30

OS

HWAnalytics Pipeline

Dashboards

App

OS

Virtualization

HW

App

Snap

Page 31: Intro to open source telemetry   linux con 2016

Visibility at all layers

31

OS

HWAnalytics Pipeline

Dashboards

App

OS

Virtualization

HW

App

Snap

Page 32: Intro to open source telemetry   linux con 2016

Visibility at all layers

32

OS

HWAnalytics Pipeline

Dashboards

App

OS

HW

App

Snap

Kubernetes

Page 33: Intro to open source telemetry   linux con 2016

Visibility at all layers

OS

HW

App

Snap

Kubernetes

OS

HW

App

OS

HW

App

OS

HW

App

OS

HW

App

OS

HW

App

OS

HW

App

Page 34: Intro to open source telemetry   linux con 2016

34

REST & CLI Flexible Scheduling Caching Security

Plugin Lifecycle Management Worker Queues Metric Catalog Tribe

Page 35: Intro to open source telemetry   linux con 2016

Thought Leadership Ahead

35

Warning:

Page 36: Intro to open source telemetry   linux con 2016

Monitoring is

36

Monitoring

Page 37: Intro to open source telemetry   linux con 2016

37

Monitoring

TelemetryAlerts

Persistence

Learning

Visualization

LoggingNotifications

Monitoring is

Page 38: Intro to open source telemetry   linux con 2016

38

Monitoring is

Telemetry

Page 39: Intro to open source telemetry   linux con 2016

39

Monitoring is

TelemetryCollect

Process

Publish

Schedule

Automate

Page 40: Intro to open source telemetry   linux con 2016

40

Monitoring

TelemetryAlerts

Persistence

Learning

Visualization

LoggingNotifications

Monitoring is

Page 41: Intro to open source telemetry   linux con 2016

41

Monitoring

TelemetryAlerts

Persistence

Learning

Visualization

LoggingNotifications

Monitoring is

Snap

Page 42: Intro to open source telemetry   linux con 2016

42

Monitoring

TelemetryAlerts

Persistence

Learning

Visualization

LoggingNotifications

Monitoring is

Grafana

Page 43: Intro to open source telemetry   linux con 2016

Better Thought Leadership

43

by @obscurify by @caskey

https://github.com/mjbrender/what-we-talk-about-when-we-talk-about-telemetry

Page 44: Intro to open source telemetry   linux con 2016

Q&A

44

Page 45: Intro to open source telemetry   linux con 2016

FAQ

45

Do I need telemetry?

Page 46: Intro to open source telemetry   linux con 2016

FAQ

46

I don’t need telemetry, I have ____________.

Page 47: Intro to open source telemetry   linux con 2016

FAQ

47

I don’t need telemetry, I have ____________.Graphite

Page 48: Intro to open source telemetry   linux con 2016

48

Monitoring

TelemetryAlerts

Persistence

Learning

Visualization

LoggingNotifications

Monitoring is

Graphite

Page 49: Intro to open source telemetry   linux con 2016

FAQ

49

Do I need monitoring?

Page 50: Intro to open source telemetry   linux con 2016

FAQ

50

We run ________ for monitoring.Nagios

Page 51: Intro to open source telemetry   linux con 2016

51

Monitoring

TelemetryAlerts

Persistence

Learning

Visualization

LoggingNotifications

Monitoring is

Nagios

Page 52: Intro to open source telemetry   linux con 2016

What Is Telemetry? (revisited)

What How

Application Availability ping

Operating System Performance

psutil

Hardware UtilizationIntel Performance Counter Metrics (PCM)

Page 53: Intro to open source telemetry   linux con 2016

What Is Telemetry? (revisited)

What Query Collect Process Publish Visualize

Application Availability ping ? ? ? ?

Operating System Performance

psutil ? ? ? ?

Hardware Utilization PCM ? ? ? ?

How Expanded

Page 54: Intro to open source telemetry   linux con 2016

What Is Telemetry? (revisited)

What Query Collect Process Publish Visualize

Application Availability ping ? ? ? ?

Operating System Performance

psutil ? ? ? ?

Hardware Utilization PCM ? ? ? ?

How Expanded

Snap Grafana

Page 55: Intro to open source telemetry   linux con 2016

55

Page 56: Intro to open source telemetry   linux con 2016

Next Up

56

Start using Snap!• snap-telemetry.io• github.com/intelsdi-x

Find me:• on The Geek Whisperers• and @mjbrender

Page 57: Intro to open source telemetry   linux con 2016

additional information

57

Page 58: Intro to open source telemetry   linux con 2016

Everything is Challenging At Scale

58

Page 59: Intro to open source telemetry   linux con 2016

Add new task

59

Page 60: Intro to open source telemetry   linux con 2016

Add new task

60

Page 61: Intro to open source telemetry   linux con 2016

define as a tribe

Scaling with Tribe

61

Page 62: Intro to open source telemetry   linux con 2016

Scaling with Tribe

Add new task

62

Page 63: Intro to open source telemetry   linux con 2016

snap | What’s next?

Physical/Virtual Host

Scheduler

Processing

Publishing

Collection

63

Page 64: Intro to open source telemetry   linux con 2016

snap | What’s next?

64

Physical/VM Host

Physical/VM Host

Physical/VM Host

Physical/VM Host

Physical/VM Host Physical/VM Host

Collection

Collection

Collection

Scheduler

Processing Publishing

Page 65: Intro to open source telemetry   linux con 2016

§ Plugin load§ Dynamic, does not require restart§ Automatically is informed by plugin on the features, metrics, and configuration detail.§ Dynamically extends the metric catalog when loaded.

§ Plugin unload§ Removes metrics from catalog automatically

§ Loading a new plugin automatically upgrades running workflows in tasks

§ Optionally the collection can be pinned to a version(ex: get /intel/server/cpu/load/v1)

§ Each scheduled workflow automatically uses the most mature plugin for that step§ Coupled with dynamic plugin loading results in instantaneous updates to existing workflows

§ Helpful for bug fixes, security patching, improving accuracy

snap | Plugin Lifecycle

65

Page 66: Intro to open source telemetry   linux con 2016

Customizable definition of task and related workflow:

CollectPublish

Publish

Collect Publish ProcessCollect Publish

Collect

Process Publish

Process Publish

snap | Overview – Example Workflows

66

Page 67: Intro to open source telemetry   linux con 2016

The Catalog

67

Intel PCMpsutil HAProxy

/intel/psutil/load/load1

/intel/psutil/load/load5

/intel/psutil/vm/available

/intel/pcm/EXEC

/intel/pcm/FREQ

/intel/linux/docker/cpu_stats/throttling_data/periods

snapctl metric list

/intel/server/health/score

DockerIntel

Health

/intel/haproxy/info/MaxConnRate

snap