30
Telemetry & Analytics Izabella Raulin [email protected] https://github.com/IzabellaRaulin https://github.com/intelsdi-x/snap © 2016 Intel Corporation

Izabella Raulin - infoshare.pl

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Izabella Raulin - infoshare.pl

Telemetry & Analytics

Izabella Raulin

[email protected]

https://github.com/IzabellaRaulin

https://github.com/intelsdi-x/snap

© 2016 Intel Corporation

Page 2: Izabella Raulin - infoshare.pl

2

Agenda

1. Software Defined Infrastructure Team

2. Data center scheduling and workload management

• Intelligent Resource Orchestration

3. Role of telemetry in resource orchestration

4. Snap – an open source telemetry tool

• How to get started with Snap

• How to monitor at scale easy

• Snap DEMO

© 2016 Intel Corporation

Page 3: Izabella Raulin - infoshare.pl

3© 2016 Intel Corporation

Page 4: Izabella Raulin - infoshare.pl

4© 2016 Intel Corporation

Page 5: Izabella Raulin - infoshare.pl

5

The next level of cloud evolution

emerges data centers

which are: smarter

self-aware

self-optimizing

self-scaling

self-healing

© 2016 Intel Corporation

Page 6: Izabella Raulin - infoshare.pl

6

“When technological progress increases the efficiency with which a resource is used (reducing the amount necessary for any one use), but the rate of consumption of that resource rises because of increasing demand.”

Jevon’s Paradox

© 2016 Intel Corporation

Page 7: Izabella Raulin - infoshare.pl

7

© 2016 Intel Corporation

Page 8: Izabella Raulin - infoshare.pl

8

© 2016 Intel Corporation

Page 9: Izabella Raulin - infoshare.pl

9

© 2016 Intel Corporation

Page 10: Izabella Raulin - infoshare.pl

10

© 2016 Intel Corporation

Page 11: Izabella Raulin - infoshare.pl

11

Inteligent Resource Orchestration

Watch: Observe environment, collect metrics

Decide: Based on observation

determine the best decision

Act: Take an action on that decision

Learn: Learn from that decision and improve the future ones

and knowledge,

© 2016 Intel Corporation

Page 12: Izabella Raulin - infoshare.pl

Watch: Observe environment, collect metrics

Decide: Based on observation

determine the best decision

Act: Take an action on that decision

Learn: Learn from that decision and improve the future ones

and knowledge,

12

Inteligent Resource Orchestration

© 2016 Intel Corporation

Page 13: Izabella Raulin - infoshare.pl

13

What to measure and how to measure it?

perf

ethtool

iostat

pidstat

netstat

htop

collectd

diamond

Every potential indicator has its own set of tools that don’t necessarily fit into other tools

© 2016 Intel Corporation

Page 14: Izabella Raulin - infoshare.pl

14

• How to collect and bind metrics gathered from different tools?

• How to avoid writing customize scripts?

• Where to store collected data?

• How to monitor at scale easy?

• How to connect data to make valuable analysis?

• How to visualize collected metrics?

• How to compare data collaborate with others teams on that?

What to measure?How to measure it?What next?

© 2016 Intel Corporation

Page 15: Izabella Raulin - infoshare.pl

15

snap – an open telemetry framework

Easily collect, process, and publish telemetry data at scale

• Empower systems to expose a consistent set of telemetry data

• Simplify telemetry ingestion across ubiquitous storage systems

• Improve the deployment model, packaging and flexibility for collecting telemetry

• Allow flexible processing of telemetry data on agent (e.g. filtering and decoration)

• Provide powerful clustered control of telemetry workflows across small or large clusters -

TRIBE

Snap is not intended to• Operate as an analytics platform – It is intended to feed them• Compete with existing metric/monitoring/telemetry agents

© 2016 Intel Corporation

Page 16: Izabella Raulin - infoshare.pl

16

snap | Workflow

© 2016 Intel Corporation

Page 17: Izabella Raulin - infoshare.pl

17

Collect telemetry data via plugins for:

Hardware: SNMP, CPU, Disk, NIC, Intel NodeManager, Intel PCM, SMART, …

Containers and VMs: Cgroups, Docker, Libvirt, Mesos, Perf events, Processes, …

Applications and Services: Apache, Cassandra, CEPH, Etcd, HAProxy,

InfluxDB, MySQL, NFS, RabbitMQ, …

OpenStack: Nova, Cinder, Glance, Keystone, Neutron

snap | Collectors

© 2016 Intel Corporation

Page 18: Izabella Raulin - infoshare.pl

18

Filter, alter or append metadata as many times as you need via plugins for:

Filtering

Anomaly Detection

Statistics and Normalization

Encryption for all or part of the data set

Injection of remote requires for tokens

snap | Processors

© 2016 Intel Corporation

Page 19: Izabella Raulin - infoshare.pl

19

Publish data as many times as you need via plugins for:

Dashboard Tools: Graphite, Grafana, Riemann

Queues and Logs: RabbitMQ, Kafka, File

Databases: PostgresSQL, InfluxDB, OpenTSDB, MySQL, HANA, Etcd, KairosDB

Storing the same telemetry on independent pipelines.

snap | Publishers

© 2016 Intel Corporation

Page 20: Izabella Raulin - infoshare.pl

20

List of all available plugins:

https://github.com/intelsdi-x/snap/blob/master/docs/PLUGIN_CATALOG.md(*) Right now snap only supports Linux and OS X (Darwin)

snap | Plugins

© 2016 Intel Corporation

Page 21: Izabella Raulin - infoshare.pl

snap | Plugin Lifecycle

21

a) Plugin load

• Dynamic, does not require restart

• Automatically is informed by plugin on the features, metrics

• Dynamically extends the metric catalog when loaded

b) Plugin unload

• Removes metrics from catalog automatically

c) Plugin swap

• Swaps a newer version plugin for an old one in a safe transaction

Dynamic plugin operations means loading, updating, and unloading plugins without restarting snap orextra configuration management. That ensures simple and secure bug fixes, security patching, and improvedaccuracy in production.

© 2016 Intel Corporation

Page 22: Izabella Raulin - infoshare.pl

Everything is Challenging At Scale

Page 23: Izabella Raulin - infoshare.pl

23

Send task to one host have it replicated to all hosts

snap | Tribe

© 2016 Intel Corporation

Page 24: Izabella Raulin - infoshare.pl

24

Physical/VM Host

Physical/VM Host

Physical/VM Host

Physical/VM Host

Physical/VM Host Physical/VM Host

Collection

Collection

Collection

Scheduler

Processing Publishing

snap | Distributed workload

© 2016 Intel Corporation

Page 25: Izabella Raulin - infoshare.pl

© 2016 Intel Corporation

Page 26: Izabella Raulin - infoshare.pl

26

snap up to find more

Github https://github.com/intelsdi-x/snap

Slack channel https://intelsdi-x.slack.com/messages/snap-telemetry/

Medium blogposts https://medium.com/intel-sdi

”Snap and Kubernetes: together at last” written by Andrzej Kuriata

”Setting Up Your Snap Development Environment” written by Sarah Han

”Measuring Snap performance” written by Olivier Cano

”What I mean by telemetry” written by Matthew Brender

”The Guts of Tasks: How Snap Gathers Telemetry” written by Matthew Brender

See latest release on https://github.com/intelsdi-x/snap/releases

© 2016 Intel Corporation

Page 27: Izabella Raulin - infoshare.pl

27

© 2016 Intel Corporation

Page 28: Izabella Raulin - infoshare.pl

Golang

28

http://www.meetup.com/GoLang-User-Group-Wroclaw/

https://www.facebook.com/GolangTricity

Page 29: Izabella Raulin - infoshare.pl

Izabella Raulin

[email protected]

© 2016 Intel Corporation

Page 30: Izabella Raulin - infoshare.pl