31
From 10 standalone Zabbix platforms to a major one in the cloud. Zabbix-Summit 2019 – 11/12 october in the cloud

the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

From 10 standalone Zabbix platforms to a major one in the cloud.Zabbix-Summit 2019 – 11/12 october

in the cloud

Page 2: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

The story begins for me in 3DS Outscale

2014 My job interview

• ”What will you do during your first weeks ?”

3DS Outscale

• Dassault Systèmes IaaS Cloud Provider

• Several datacenters around the world

• Tons of equipments, VMs, expert workers

My first weeks

• Let’s focus on the monitoring subject!

Page 3: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

First steps with Zabbix

• Several Zabbix Servers

• 1 Datacenter:

� 1 Zabbix-Server + Frontend

� 1 PostgreSQL Database

Page 4: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

What was my first year like ?

• Zabbix Configuration• Zabbix templating• Scripting for UserParameters• Reworked the alerting• Dashboarding using Ruby / Dashing

My daily work:

I soon became the monitoring guy.

Page 5: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Multiple Zabbix

platforms everywhere!

Z

ZZ

Z

ZZZ

ZZ

Z

Zabbix Server +

Frontends

Zabbix DB X 10+

Page 6: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Not totally satisfied

Minor issues:

- Configuration management

- Template deployment

- Dogfooding

Major issues:

- Performance

- Resilience

Page 7: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

The Zabbix-Cloud project was born…

• Create a better Zabbix Platform with :

� Performance� Resilience� Convenient to use for the teams

Page 8: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

I started by requesting politely…

•And became certified

A Zabbix Training

•That’s when Kévin, Kévin and Romain joined me.

A small team

Page 9: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

The monitoring team.

Manages the monitoring platforms.

Provide the monitoring work requested by the other teams.

Work on various projects.

Page 10: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Our toolkit!

• Saltstack

• Salt-cloud

• Git

• Backup tools

• Cloud accounts to request the API

Page 11: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

We kept our old Zabbix platforms.

Z

ZZ

Z

ZZZ

ZZ

Z

For the moment, they will survive.

Page 12: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

We then created our first instance: “The Adm”

• We deployed the ADM in the EUW2 Region

• Deployed our toolkit

• Started writing Salt states and cloud infrastructure configuration

All proxies to par1.

A

Page 13: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

The Monitoring

VPCs

Connectivity to private subnets with production equipment and vms.

EU-WEST Monitoring

VPCFirewall FR

Firewall Asia

Firewall US

VPN

Vlan to monitor

Vlan to monitor

Vlan to monitor

Vlan to monitor

Vlan to monitor

SOUTHEAST-ASIA

Monitoring VPC

US-EAST Monitoring

VPCVPN

VPN Vlan to monitor

Page 14: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

A few months later…

All proxies to par1.

Z

Zabbix-Proxies

Zabbix –Cloud server platform..

Z

A

Page 15: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Remember the small architecture ? Here is

what it looks like now !

Page 16: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during
Page 17: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Add 2 hundred proxies

( Poor config syncer ! )

Sending them to the Zabbix-Server

Collecting data everywhere

Page 18: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Tips for better performance

Database

DB partitioning and Tuning

High IOPS volumes for the DB

Strong CPU

Massive ram for InnoDB cache

Specific to Zabbix

Active items

Proxies – Active too –

Internal process monitoring

Page 19: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Tips for resilience

Multiple frontends

Multiple DB nodes

Multiple Asterisk

LBs

Send ALL your monitoring data twice, to multiple platforms!

Page 20: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Two Zabbix-cloud

platforms

All proxies to par1.

ZZabbix-Proxies

Zabbix-Cloud Server platforms.

Z

Z

Every Zabbix Agent have 2 proxies in ServerActive=

Page 21: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Both monitoring platforms

operating a the same

time

Zabbix-proxy-1

FR Server

Zabbix-proxy-2

US Server

Grafana-1

Page 22: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Twin monitoring platforms Fe

atu

rin

g th

e in

stan

ces US Zabbix

Server

• Server • Databases + LB• Frontends + LB• Grafana + LB• Jenkins + LB• Asterisk• Smashing• Custom

dashboards in NodeJS, PHP...

Jen

kin

s jo

bs

2x days

Config sync

+

Dashboards

Feat

uri

ng

the

inst

ance

s FR Zabbix Server

• Server • Databases + LB• Frontends + LB• Grafana + LB• Jenkins + LB• Asterisk• Smashing• Custom

dashboards in NodeJS, PHP...

Page 23: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

How is the configuration sync done?

MySQL dump

• Configuration tables.

Copy the dump

• To a node of the other Zabbix platform.

Stop everything that queries the DB

Restore the table in parallel

Restart everything

• Server, frontends, Grafana…

Page 24: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Pack everything in an orchestration state

Play it regularly through a Jenkins

state.We do it 2x a day.

It failed 2 times in 2 years.

Name your -1 and -2 proxies with the same proxy name.

Page 25: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Simplicity benefits

Grafana Dashboards• Our users can check their data.• Build their dashboards.• With no access to Zabbix.

A single Zabbix interface.Fast updates and configuration.

Single API

Page 26: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Firing alerts… avoiding x2 calls

Zabbix Server USMain

Asterisk-FR-1

Asterisk-FR-2

Asterisk-US-1

Asterisk-US-2

Zabbix Server FRSecondary

Page 27: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Where we are in summer 2019

Availability Max processed values /s

Items Proxies deployed AVG processed values

100% 250 000 during a burst.

1 300 000 active 160+ 10000

AVG Bandwith on server

Max tested processed alerts

Amount of TV using Grafana dashboards

Charge on the server

DB size

35 mbps Up to 30k 200+ Around 5% 1500 GB

Page 28: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

And about the old Zabbix Platforms?

Audited

Extracted the templates and

items

Abandoned

Deleted

Page 29: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

Daily mission : Monitoring

the monitoring platforms.

Page 30: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

What is coming next?

• Docker migration of everything ( Done for a part.)

• 4.2 migration

• Tons of new proxies and new cloud sites

• Opensourcing some of our home-made tools

Page 31: the cloud. From 10 standalone Zabbix platforms to …...Where we are in summer 2019 Availability Max processed values /s Items Proxies deployed AVG processed values 100% 250 000 during

OUTSCALE1 rue Royale319 bureaux de la Colline92210 Saint-Cloud - Francetel: +33 1 53 27 52 70

outscale.com•Any questions?

• Let’s talk about it.

•Or mail [email protected]