33
Monitorando Serviços Públicos Este documento descreve como você pode monitorar serviços acessíveis ao público, aplicações e protocolos. Por "público" Quero dizer de serviços que podem ser acessados através da rede - ou a rede local ou a maior Internet. Exemplos de serviços públicos incluem HTTP, POP3, IMAP, FTP e SSH. Há muitos mais serviços públicos que você provavelmente usa em uma base diária. Estes serviços e aplicações, bem como os seus protocolos subjacentes, geralmente pode ser monitorizada por Nagios sem quaisquer requisitos especiais de acesso. Os serviços privados, em contraste, não pode ser monitorado com Nagios sem um agente intermediário de algum tipo. Exemplos de serviços privados associados anfitriões são coisas como carga de CPU, uso de memória, uso de disco, contagem de usuário atual, informações do processo, etc Estes serviços ou atributos de exércitos privados não são normalmente expostos a clientes externos. Esta situação exige que um agente de monitoramento intermediário ser instalado em qualquer máquina que você precisa para monitorar tais informações. Mais informações sobre como monitorar os serviços privados em diferentes tipos de hospedeiros podem ser encontrados na documentação em: Monitoramento Host (http://172.17.2.82/mediawiki/index.php/Monitoramento_Host ) Plugins For Monitoring Services Quando você achar que precisa para monitorar um determinado aplicativo, serviço ou protocolo, as chances são boas de que existe um plugin para monitorá-lo. A distribuição oficial plugins Nagios vem com plugins que podem ser usados para monitorar uma variedade de serviços e protocolos. Há também um grande número de plugins contribuíram que podem ser encontrados no contrib / subdiretório da distribuição do plugin. O site NagiosExchange.org abriga uma série de plugins adicionais que foram escritos pelos usuários, de modo a verificar-lo quando tiver a chance. Se não acontecer de você encontrar um plugin adequado para monitorar o que você precisa, você sempre pode escrever o seu próprio. Plugins são fáceis de escrever, por isso não deixe este pensamento assustá-lo fora. Leia a documentação sobre o desenvolvimento de plugins para obter mais informações. Eu vou levá-lo através de monitoramento de alguns serviços básicos que você provavelmente vai usar mais cedo ou mais tarde. Cada um desses serviços pode ser monitorado usando um dos plugins que é instalado como parte da distribuição Nagios plugins. Monitoramento HTTP As chances são que você vai querer monitorar servidores web, em algum momento - seu ou de outra pessoa. O plugin check_http é projetado para fazer exatamente isso. Ele entende o protocolo HTTP e pode monitorar o tempo de resposta, códigos de erro, cordas no HTML retornado, certificados de servidor, e muito mais.

Monitorando Serviços Públicos

Embed Size (px)

Citation preview

Page 1: Monitorando Serviços Públicos

Monitorando Serviços Públicos Este documento descreve como você pode monitorar serviços acessíveis ao público, aplicações e protocolos. Por "público" Quero dizer de serviços que podem ser acessados através da rede - ou a rede local ou a maior Internet. Exemplos de serviços públicos incluem HTTP, POP3, IMAP, FTP e SSH. Há muitos mais serviços públicos que você provavelmente usa em uma base diária. Estes serviços e aplicações, bem como os seus protocolos subjacentes, geralmente pode ser monitorizada por Nagios sem quaisquer requisitos especiais de acesso. Os serviços privados, em contraste, não pode ser monitorado com Nagios sem um agente intermediário de algum tipo. Exemplos de serviços privados associados anfitriões são coisas como carga de CPU, uso de memória, uso de disco, contagem de usuário atual, informações do processo, etc Estes serviços ou atributos de exércitos privados não são normalmente expostos a clientes externos. Esta situação exige que um agente de monitoramento intermediário ser instalado em qualquer máquina que você precisa para monitorar tais informações. Mais informações sobre como monitorar os serviços privados em diferentes tipos de hospedeiros podem ser encontrados na documentação em: • Monitoramento Host (http://172.17.2.82/mediawiki/index.php/Monitoramento_Host) Plugins For Monitoring Services Quando você achar que precisa para monitorar um determinado aplicativo, serviço ou protocolo, as chances são boas de que existe um plugin para monitorá-lo. A distribuição oficial plugins Nagios vem com plugins que podem ser usados para monitorar uma variedade de serviços e protocolos. Há também um grande número de plugins contribuíram que podem ser encontrados no contrib / subdiretório da distribuição do plugin. O site NagiosExchange.org abriga uma série de plugins adicionais que foram escritos pelos usuários, de modo a verificar-lo quando tiver a chance. Se não acontecer de você encontrar um plugin adequado para monitorar o que você precisa, você sempre pode escrever o seu próprio. Plugins são fáceis de escrever, por isso não deixe este pensamento assustá-lo fora. Leia a documentação sobre o desenvolvimento de plugins para obter mais informações. Eu vou levá-lo através de monitoramento de alguns serviços básicos que você provavelmente vai usar mais cedo ou mais tarde. Cada um desses serviços pode ser monitorado usando um dos plugins que é instalado como parte da distribuição Nagios plugins. Monitoramento HTTP As chances são que você vai querer monitorar servidores web, em algum momento - seu ou de outra pessoa. O plugin check_http é projetado para fazer exatamente isso. Ele entende o protocolo HTTP e pode monitorar o tempo de resposta, códigos de erro, cordas no HTML retornado, certificados de servidor, e muito mais.

Page 2: Monitorando Serviços Públicos

O arquivo commands.cfg contém uma definição de comando para usar o plugin check_http. Parece que este: define command{ name check_http command_name check_http command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$ }

A definição de serviço simples para monitorar o serviço HTTP na máquina remotehost pode ter esta aparência: define service{ use generic-service ; Inherit default values from a template host_name remotehost service_description HTTP check_command check_http }

Esta definição de serviço simples irá monitorar o serviço HTTP em execução no remotehost. Ela irá produzir alertas se o servidor web não responder dentro de 10 segundos ou se ele retorna códigos de erros HTTP (403, 404, etc.) Isso é tudo que você precisa para monitoramento básico. define service{ use generic-service ; Inherit default values from a template host_name remotehost service_description Product Download Link check_command check_http!-u /download/index.php -t 5 -s "latest-version.tar.gz" }

Monitoramento FTP Quando você precisa para monitorar servidores de FTP, você pode usar o plugin check_ftp. O arquivo commands.cfg contém uma definição de comando para usar o plugin check_ftp, que se parece com isso: define command{

Page 3: Monitorando Serviços Públicos

command_name check_ftp command_line $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$ }

A definição de serviço simples para monitorar o servidor FTP em remote host ficaria assim: define service{ use generic-service ; Inherit default values from a template host_name remotehost service_description FTP check_command check_ftp }

This service definition will monitor the FTP service and generate alerts if the FTP server doesn't respond within 10 seconds. Esta definição de serviço irá monitorar o serviço de FTP e gerar alertas se o servidor FTP não responder dentro de 10 segundos. A definição de serviço mais avançado é mostrado abaixo. Este serviço irá verificar o servidor FTP rodando na porta 1023 em remotehost. Ele irá gerar um alerta, se o servidor não responder dentro de 5 segundos ou se a resposta do servidor não contém a string "Pure-FTPd [TLS]". define service{ use generic-service ; Inherit default values from a template host_name remotehost service_description Special FTP check_command check_ftp!-p 1023 -t 5 -e "Pure-FTPd [TLS]" }

Monitoramento SSH Quando você precisa para monitorar servidores SSH, você pode usar o plugin check_ssh. O arquivo commands.cfg contém uma definição de comando para usar o plugin check_ssh, que se parece com isso:

Page 4: Monitorando Serviços Públicos

define command{ command_name check_ssh command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$ }

A definição de serviço simples para monitorar o servidor SSH em remotehost ficaria assim: define service{ use generic-service ; Inherit default values from a template host_name remotehost service_description SSH check_command check_ssh }

Esta definição de serviço irá monitorar o serviço SSH e gerar alertas se o servidor SSH não responder dentro de 10 segundos. A definição de serviço mais avançado é mostrado abaixo. Este serviço irá verificar o servidor SSH e gerar um alerta, se o servidor não responder dentro de 5 segundos ou se a versão do servidor string string não corresponde "OpenSSH_4.2". define service{ use generic-service ; Inherit default values from a template host_name remotehost service_description SSH Version Check check_command check_ssh!-t 5 -r "OpenSSH_4.2" }

Monitoramento SMTP O plugin check_smtp pode ser usando para monitorar seus servidores de e-mail. O arquivo commands.cfg contém uma definição de comando para usar o plugin check_smtp, que se parece com isso: define command{

Page 5: Monitorando Serviços Públicos

command_name check_smtp command_line $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$ }

A definição de serviço simples para monitorar o servidor SMTP em remote host ficaria assim: define service{ use generic-service ; Inherit default values from a template host_name remotehost service_description SMTP check_command check_smtp }

Esta definição de serviço irá monitorar o serviço SMTP e gerar alertas se o servidor SMTP não responder dentro de 10 segundos. A definição de serviço mais avançado é mostrado abaixo. Este serviço irá verificar o servidor SMTP e gerar um alerta, se o servidor não responder dentro de 5 segundos ou se a resposta do servidor não contém "teste.com". define service{ use generic-service ; Inherit default values from a template host_name remotehost service_description SMTP Response Check check_command check_smtp!-t 5 -e "teste.com" }

Monitoramento POP3 O plugin check_pop pode ser usando para monitorar o serviço POP3 em seus servidores de e-mail. O arquivo commands.cfg contém uma definição de comando para usar o plugin check_pop, que se parece com isso: define command{ command_name check_pop

Page 6: Monitorando Serviços Públicos

command_line $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$ }

A definição de serviço simples para monitorar o serviço POP3 em remotehost ficaria assim: define service{ use generic-service ; Inherit default values from a template host_name remotehost service_description POP3 check_command check_pop }

Esta definição de serviço irá monitorar o serviço POP3 e gerar alertas se o servidor POP3 não responder dentro de 10 segundos. A definição de serviço mais avançado é mostrado abaixo. Este serviço irá verificar o serviço POP3 e gerar um alerta, se o servidor não responder dentro de 5 segundos ou se a resposta do servidor não contém "teste.com". define service{ use generic-service ; Inherit default values from a template host_name remotehost service_description POP3 Response Check check_command check_pop!-t 5 -e "teste.com" }

Monitoramento IMAP O plugin check_imap pode ser usando para monitorar IMAP4 serviço em seus servidores de e-mail. O arquivo commands.cfg contém uma definição de comando para usar o plugin check_imap, que se parece com isso: define command{ command_name check_imap

Page 7: Monitorando Serviços Públicos

command_line $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$ }

A definição de serviço simples para monitorar o serviço IMAP4 em remotehost ficaria assim: define service{ use generic-service ; Inherit default values from a template host_name remotehost service_description IMAP check_command check_imap }

Esta definição de serviço irá monitorar o serviço IMAP4 e gerar alertas se o servidor IMAP não responder dentro de 10 segundos. A definição de serviço mais avançado é mostrado abaixo. Este serviço irá verificar o serviço IMAP4 e gerar um alerta, se o servidor não responder dentro de 5 segundos ou se a resposta do servidor não contém "teste.com". define service{ use generic-service ; Inherit default values from a template host_name remotehost service_description IMAP4 Response Check check_command check_imap!-t 5 -e "teste.com" }

Monitoramento Mysql

check_mysql_health

Description

check_mysql_health is a plugin to check various parameters of a MySQL database.

Command line parameters

Page 8: Monitorando Serviços Públicos

–hostname <hostname>

The database server which should be monitored. In case of “localhost” this parameter can be omitted.

–username <username>

The database user.

–password <password>

Password of the database user.

–mode <modus>

With the mode-parameter you tell the plugin what it should do. See the list of possible values further down.

–name <objektname>

Here the check can be limited to a single object. (Momentarily this parameter is only used for mode=sql)

–name2 <string>

If you use –mode=sql, then the SQL-Statement appears in the output and performance values. With the parameter name2 you’re able to specify a string for this..

–warning <range>

Determined values outside of this range trigger a WARNING.

–critical <range>

Determined values outside of this range trigger a CRITICAL.

–environment <variable>=<wert>

With this you can pass environment variables to the script. Multiple declarations are possible.

–method <connectmethode>

Page 9: Monitorando Serviços Públicos

With this parameter you tell the plugin how it should connect to the database. (dbi for using DBD::mysql (default), mysql for mysql-Tool).

–units <%|KB|MB|GB>

The declaration from units serves the “beautification” of the output from mode=sql

Use the option –mode with various keywords to tell the Plugin which values it should determine and check.

Keyword Description Range

connection-time Determines how long connection establishment and login take

0..n Seconds (1, 5)

uptime Time since start of the database server (recognizes DB-Crash+Restart)

0..n Seconds (10:, 5: Minutes)

threads-connected Number of open connections 1..n (10, 20)

threadcache-hitrate Hitrate in the Thread-Cache 0%..100% (90:, 80:)

q[uery]cache-hitrate Hitrate in the Query Cache 0%..100% (90:, 80:)

q[uery]cache-lowmem-prunes

Displacement out of the Query Cache due to memory shortness

n/sec (1, 10)

[myisam-]keycache-hitrate

Hitrate in the Myisam Key Cache 0%..100% (99:, 95:)

[innodb-]bufferpool-hitrate

Hitrate in the InnoDB Buffer Pool 0%..100% (99:, 95:)

[innodb-]bufferpool-wait-free

Rate of the InnoDB Buffer Pool Waits 0..n/sec (1, 10)

[innodb-]log-waits Rate of the InnoDB Log Waits 0..n/sec (1, 10)

tablecache-hitrate Hitrate in the Table-Cache 0%..100% (99:, 95:)

table-lock-contention Rate of failed table locks 0%..100% (1, 2)

index-usage Sum of the Index-Utilization (in contrast to Full Table Scans)

0%..100% (90:, 80:)

tmp-disk-tables Percent of the temporary tables that were created on the disk instead in memory

0%..100% (25, 50)

Page 10: Monitorando Serviços Públicos

slow-queries Rate of queries that were detected as “slow” 0..n/sec (0.1, 1)

long-running-procs Sum of processes that are runnning longer than 1 minute

0..n (10, 20)

slave-lag Delay between Master and Slave 0..n Seconds

slave-io-running Checks if the IO-Thread of the Slave-DB is running

slave-sql-running Checks if the SQL-Thread of the Slave-DB is running

sql

Result of any SQL-Statement that returns a number. The statement itself is passed over with the parameter –name. A Label for the performance data output can be passed over with the parameter –name2. The parameter –units can add units to the output (%, c, s, MB, GB,..). If the SQL-Statement includeds special characters or spaces, it can first be encoded with the mode encode.

0..n

open-files Number of open files (of upper limit) 0%..100% (80, 95)

encode Reads standard input (STDIN) and outputs an encoded string.

cluster-ndb-running Checks if all cluster nodes are running.

Depending on the chosen mode two labels can appear in the performance data output.

<label>= and <label_now>=

The determinded values apply to the complete runtime of the database and to the time since the last run of check_mysl_health.

Example: qcache_hitrate=71.63%;90:;80: qcache_hitrate_now=8.25%

The Hitrate of the Query-Cache is calculated from Qcache_hits / ( Qcache_hits + Com_select ). This values are continuously increased. A serious change in access behaviour affects the hitrate only slowly. To be able to recognize temporarily fluctuations in the hitrate and, for example, assign it to an application update, the value qcache_hitrate_now is printed out additionally. This value is calculated through the difference (delta) between Qcache_hits and Com_select (actual value of the variables minus the value since the last run from check_mysql_health).

Here the command line parameter –lookback is used.

Page 11: Monitorando Serviços Públicos

if this is missing, than qcache_hitrate_now is calculated from the difference (delta) between Qcache_hits and Com_select since the last run from check_mysql_health. Important for the exitcode of the plugin is the long-term result qcache_hitrate (since database start).

if –lookback is specified with an argument n, than qcache_hitrate_now is calculated from the difference (delta) from Qcache_hits and Com_select since the last n seconnds.

For example: With –lookback 3600 you’ll get the average hitrate of the last hour, calculated back from the last plugin execution. The exitcode now also depends on this short-term test result.

It’s recommended to use –lookback but specify at least half an hour (–lookback 1800) because the now-value underlies a heavy fluctuation which would lead to frequent alarms.

Pleae note, that the thresholds must be specified according to the Nagios plug-in development Guidelines.

“10″ means “Alarm, if > 10″ und

“90:” means “Alarm, if < 90″

Connect to the database

Creating a database user

In order to be able to collect the needed information from the database a database user with specific privileges is required:

GRANT usage ON *.* TO 'nagios'@'nagiosserver' IDENTIFIED BY 'nagiospassword'

Connectionstring

To connect to the database you use the parameters –username and –password. The database server which should be used can be specified more precise with –hostname and –socket or –port.

Use of environment variables

It’s possible to omit –hostname, –username and –password as well as –socket and –port completely, if you provide the corresponding values in environment variables. Since Version 3.x it is possible to extend service definitions in Nagios through own attributes

Page 12: Monitorando Serviços Públicos

(custom object variables). These will appear during the exectution of the check command in the environment.

The environment variables are:

NAGIOS__SERVICEMYSQL_HOST (_mysql_host in the service definition) NAGIOS__SERVICEMYSQL_USER (_mysql_user in the service definition) NAGIOS__SERVICEMYSQL_PASS (_mysql_pass in the service definition) NAGIOS__SERVICEMYSQL_PORT (_mysql_port in the service definition) NAGIOS__SERVICEMYSQL_SOCK (_mysql_sock in the service definition)

Examples

nagios$ check_mysql_health --hostname mydb3 --username nagios --password nagios --mode connection-time OK - 0.03 seconds to connect as nagios | connection_time=0.0337s;1;5 nagios$ check_oracle_health --mode=connection-time OK - 0.17 seconds to connect | connection_time=0.1740;1;5 nagios$ check_mysql_health --mode querycache-hitrate CRITICAL - query cache hitrate 70.97% | qcache_hitrate=70.97%;90:;80: qcache_hitrate_now=72.25% selects_per_sec=270.00 nagios$ check_mysql_health --mode querycache-hitrate --warning 80: --critical 70: WARNING - query cache hitrate 70.82% | qcache_hitrate=70.82%;80:;70: qcache_hitrate_now=62.82% selects_per_sec=420.17 nagios$ check_mysql_health --mode sql --name 'select 111 from dual' CRITICAL - select 111 from dual: 111 | 'select 111 from dual'=111;1;5 nagios$ echo 'select 111 from dual' | check_mysql_health --mode encode select%20111%20from%20dual nagios$ check_mysql_health --mode sql --name select%20111%20from%20dual CRITICAL - select 111 from dual: 111 | 'select 111 from dual'=111;1;5 nagios$ check_mysql_health --mode sql --name select%20111%20from%20dual --name2 myval CRITICAL - myval: 111 | 'myval'=111;1;5

Page 13: Monitorando Serviços Públicos

nagios$ check_mysql_health --mode sql --name select%20111%20from%20dual --name2 myval --units GB CRITICAL - myval: 111GB | 'myval'=111GB;1;5 nagios$ check_mysql_health --mode sql --name select%20111%20from%20dual --name2 myval --units GB --warning 100 --critical 110 CRITICAL - myval: 111GB | 'myval'=111GB;100;110

Installation

The plugin requires the installation of a mysql-client packages. The installation of the perl-modules DBI and DBD::mysql is desirable, but not mandatory.

After unpacking the archive ./configure is called. With ./configure –help some options can be printed which show some default values for compiling the plugin.

–prefix=BASEDIRECTORY

Specify a directory in which check_mysql_health should be stored. (default: /usr/local/nagios)

–with-nagios-user=SOMEUSER

This User will be the owner of the check_mysql_health file. (default: nagios)

–with-nagios-group=SOMEGROUP

The group of the check_mysql_health plugin. (default: nagios)

–with-perl=PATHTOPERL

Specify the path to the perl interpreter you wish to use. (default: perl in PATH)

Download

check_mysql_health-2.1.8.2.tar.gz

Manche tar-Versionen haben Probleme wegen der langen Dateinamen. In diesem Fall entpacken sie bitte das shar-Paket mit

cat check_mysql_health-xxx.shar.gz | gzip -d | sh

Page 14: Monitorando Serviços Públicos

Monitoramento MS SQL Server

check_mssql_health

Description

check_mssql_health is a plugin, which is used to monitor different parameters of a MS SQL server.

Documentation

Command line parameters

–hostname <hostname> The database server –username <username> The database user –password <password> The database passwort –port <port> The port, where the server listens (Default: 1433) –server <server> An alternative to hostname+port. <server> will be looked up in

the file freetds.conf. –mode <modus> With the mode-parameter you tell the plugin what you want it to

do. See list below for possible values. –name <objectname> Several checks can be limited to a single object (e.g. a

specific database). It is also used for mode=sql. (See the examples) –name2 <string> If you use –mode=sql, the SQL-statement will be shown in the

plugin output and the performance data (which looks ugly). The parameter name2 can be used to provide a used-defined string.

–warning <range> Values outside this range result in a WARNING. –critical <range> Values outside this range result in a CRITICAL. –environment <variable>=<wert> It is possible to set environment variables at

runtime with htis parameter. It can be used multiple times. –method <connectmethode> With this parameter you tell the plugin, which

connection method it should use. Known values are: dbi for the perl module DBD::Sybase (default) and sqlrelay for the SQLRelay proxy..

–units <%|KB|MB|GB> This parameter adds units to the performance, when using mode=sql

–dbthresholds With this parameter thresholds are read from the database table check_mssql_health_thresholds

Keyword Meaning Threshold range

connection-time Measures how long it takes to login 0..n Sek (1, 5)

Page 15: Monitorando Serviços Públicos

connected-users Number of connected users 0..n (50, 80)

cpu-busy CPU Busy Time 0%..100% (80, 90)

io-busy IO Busy Time 0%..100% (80, 90)

full-scans Number of full table scans per second 0..n (100, 500)

transactions Number of transactions per second 0..n (10000, 50000)

batch-requests Number of batch requests per second 0..n (100, 200)

latches-waits Number of Latch-Requests per second, which could not be fulfilled

0..n (10, 50)

latches-wait-time Average time a Latch-Request had to wait until it was granted

0..n ms (1, 5)

locks-waits Number of Lock-Requests per second, which could not be satisfied

0..n (100, 500)

locks-timeouts Number of Lock-Requests per second, which resulted in a timeout

0..n (1, 5)

locks-deadlocks Number of Deadlocks per second 0..n (1, 5)

sql-recompilations

Number of Re-Compilations per second 0..n (1, 10)

sql-initcompilations

Number of Initial Compilations per second 0..n (100, 200)

total-server-memory

The main memory reserved for the SQL Server 0..n (nearly1G, 1G)

mem-pool-data-buffer-hit-ratio

Data Buffer Cache Hit Ratio 0%..100% (90, 80:)

lazy-writes Number of Lazy Writes per second 0..n (20, 40)

page-life-expectancy

Average time a page stays in main memory 0..n (300:, 180:)

Page 16: Monitorando Serviços Públicos

free-list-stalls Free List Stalls per second 0..n (4, 10)

checkpoint-pages Number of Flushed Dirty Pages per second 0..n ()

database-online Prüft, ob eine Datenbank online ist und Verbindungen akzeptiert

-

database-free Free space in a database (Default is percent, but –units can be used also). You can select a single database with the name parameter

0%..100% (5%, 2%)

database-backup-age

Elapsed time since a database was last backupped (in hours). The performancedata also cover the time needed for the backup (in minutes)

0..n

database-logbackup-age

Elapsed time since a database log was last backupped (in hours). The performancedata also cover the time needed for the backup (in minutes)

0..n

database-file-auto-growths

The number of File Auto Grow events (either data or log) in the last <n> minutes (use –lookback)

0..n (1, 5)

database-logfile-auto-growths

The number of Log File Auto Grow events in the last <n> minutes (use –lookback)

0..n (1, 5)

database-datafile-auto-growths

The number of Data File Auto Grow events in the last <n> minutes (use –lookback)

0..n (1, 5)

database-file-auto-shrinks

The number of File Auto Shrink events (either data or log) in the last <n> minutes (use –lookback)

0..n (1, 5)

database-logfile-auto-shrinks

The number of Log File Auto Shrink events in the last <n> minutes (use –lookback)

0..n (1, 5)

database-datafile-auto-shrinks

The number of Data File Auto Shrink events in the last <n> minutes (use –lookback)

0..n (1, 5)

database-file-dbcc-shrinks

The number of DBCC File Shrink events (either data or log) in the last <n> minutes (use –lookback)

0..n (1, 5)

failed-jobs The number of jobs which did not exit successful in the last <n> minutes (use –lookback)

0..n (1, 5)

sql Result of a user-defined SQL statement, which returns a numerical value. The statement is passed to the plugin as an argument to the –name parameter. A label for the performancedata can be defined with the –name2 parameter. A unit can be appended by using –units. If the SQL statement contains special characters, it is

0..n

Page 17: Monitorando Serviços Públicos

recommended to encode it first by calling check_mssql_health with the –mode encode parameter and sending the statement to STDIN

sql-runtime Runtime of a custom sql statement in seconds 0..n (1, 5)

list-databases Returns a list of all databases -

list-locks Returns a list of all locks -

Please keep the Nagios Developer Guidelines in mind, when you use thresholds. “10″ means “Alarm, if > 10″ und “90:” means “Alarm, if < 90″

Preparation of the database

In order for the plugin to operate correctly, a database user with specific privileges is required. The most simple way is to assign the Nagios-user the role “serveradmin”. As an alternative you can use the sa-User for the database connection. Alas, this opens a serious security hole, as the (cleartext) administrator password can be found in the nagios configuration files Birk Bohne wrote the following script which allows the automated creation of a minimal, yet sufficient privileged monitoring-user.

declare @dbname varchar(255) declare @check_mssql_health_USER varchar(255) declare @check_mssql_health_PASS varchar(255) declare @check_mssql_health_ROLE varchar(255) declare @source varchar(255) declare @options varchar(255) declare @backslash int /*******************************************************************/ SET @check_mssql_health_USER = '"[Servername|Domainname]\Username"' SET @check_mssql_health_PASS = 'Password' SET @check_mssql_health_ROLE = 'Rolename' /******************************************************************* PLEASE CHANGE THE ABOVE VALUES ACCORDING TO YOUR REQUIREMENTS - Example for Windows authentication: SET @check_mssql_health_USER = '"[Servername|Domainname]\Username"' SET @check_mssql_health_ROLE = 'Rolename'

Page 18: Monitorando Serviços Públicos

- Example for SQL Server authentication: SET @check_mssql_health_USER = 'Username' SET @check_mssql_health_PASS = 'Password' SET @check_mssql_health_ROLE = 'Rolename' !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! It is strongly recommended to use Windows authentication. Otherwise you will get no reliable results for database usage. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! *********** NO NEED TO CHANGE ANYTHING BELOW THIS LINE *************/ SET @options = 'DEFAULT_DATABASE=MASTER, DEFAULT_LANGUAGE=English' SET @backslash = (SELECT CHARINDEX('\', @check_mssql_health_USER)) IF @backslash > 0 BEGIN SET @source = ' FROM WINDOWS' SET @options = ' WITH ' + @options END ELSE BEGIN SET @source = '' SET @options = ' WITH PASSWORD=''' + @check_mssql_health_PASS + ''',' + @options END PRINT 'create Nagios plugin user ' + @check_mssql_health_USER EXEC ('CREATE LOGIN ' + @check_mssql_health_USER + @source + @options) EXEC ('USE MASTER GRANT VIEW SERVER STATE TO ' + @check_mssql_health_USER) PRINT 'User ' + @check_mssql_health_USER + ' created.' PRINT '' declare dblist cursor for select name from sysdatabases WHERE name NOT IN ('master', 'tempdb', 'msdb') open dblist fetch next from dblist into @dbname while @@fetch_status = 0 begin EXEC ('USE [' + @dbname + '] print ''Grant permissions in the db '' + ''"'' + DB_NAME() + ''"''') EXEC ('USE [' + @dbname + '] CREATE ROLE ' + @check_mssql_health_ROLE) EXEC ('USE [' + @dbname + '] GRANT EXECUTE TO ' + @check_mssql_health_ROLE) EXEC ('USE [' + @dbname + '] GRANT VIEW DATABASE STATE TO ' + @check_mssql_health_ROLE) EXEC ('USE [' + @dbname + '] GRANT VIEW DEFINITION TO ' + @check_mssql_health_ROLE)

Page 19: Monitorando Serviços Públicos

EXEC ('USE [' + @dbname + '] CREATE USER ' + @check_mssql_health_USER + ' FOR LOGIN ' + @check_mssql_health_USER) EXEC ('USE [' + @dbname + '] EXEC sp_addrolemember ' + @check_mssql_health_ROLE + ' , ' + @check_mssql_health_USER) EXEC ('USE [' + @dbname + '] print ''Permissions in the db '' + ''"'' + DB_NAME() + ''" granted.''') fetch next from dblist into @dbname end close dblist deallocate dblist

Please keep in mind that check_mssql_health’s functionality is limited when using SQL Server authentication. This method is strongly discouraged . Normally there is already a Nagios-(Windows-)-user which can be used for the Windows authentication method.

Another script from the same author removes the monitoring user from the database.

declare @dbname varchar(255) declare @check_mssql_health_USER varchar(255) declare @check_mssql_health_ROLE varchar(255) SET @check_mssql_health_USER = '"[Servername|Domainname]\Username"' SET @check_mssql_health_ROLE = 'Rolename' declare dblist cursor for select name from sysdatabases WHERE name NOT IN ('master', 'tempdb', 'msdb') open dblist fetch next from dblist into @dbname while @@fetch_status = 0 begin EXEC ('USE [' + @dbname + '] print ''Revoke permissions in the db '' + ''"'' + DB_NAME() + ''"''') EXEC ('USE [' + @dbname + '] EXEC sp_droprolemember ' + @check_mssql_health_ROLE + ' , ' + @check_mssql_health_USER) EXEC ('USE [' + @dbname + '] DROP USER ' + @check_mssql_health_USER) EXEC ('USE [' + @dbname + '] REVOKE VIEW DEFINITION TO ' + @check_mssql_health_ROLE) EXEC ('USE [' + @dbname + '] REVOKE VIEW DATABASE STATE TO ' + @check_mssql_health_ROLE) EXEC ('USE [' + @dbname + '] REVOKE EXECUTE TO ' + @check_mssql_health_ROLE) EXEC ('USE [' + @dbname + '] DROP ROLE ' + @check_mssql_health_ROLE) EXEC ('USE [' + @dbname + '] print ''Permissions in the db '' + ''"'' + DB_NAME() + ''" revoked.''') fetch next from dblist into @dbname end

Page 20: Monitorando Serviços Públicos

close dblist deallocate dblist PRINT '' PRINT 'drop Nagios plugin user ' + @check_mssql_health_USER EXEC ('USE MASTER REVOKE VIEW SERVER STATE TO ' + @check_mssql_health_USER) EXEC ('DROP LOGIN ' + @check_mssql_health_USER) PRINT 'User ' + @check_mssql_health_USER + ' dropped.'

Many thanks to Birk Bohne for the excellent scripts.

Examples

nagsrv$ check_mssql_health --mode mem-pool-data-buffer-hit-ratio CRITICAL - buffer cache hit ratio is 71.21% | buffer_cache_hit_ratio=71.21%;90:;80: nagsrv$ check_mssql_health --mode batch-requests OK - 9.00 batch requests / sec | batch_requests_per_sec=9.00;100;200 nagsrv$ check_mssql_health --mode full-scans OK - 6.14 full table scans / sec | full_scans_per_sec=6.14;100;500 nagsrv$ check_mssql_health --mode cpu-busy OK - CPU busy 55.00% | cpu_busy=55.00;80;90 nagsrv$ check_mssql_health --mode database-free --name AdventureWorks OK - database AdventureWorks has 21.59% free space left | 'db_adventureworks_free_pct'=21.59%;5:;2: 'db_adventureworks_free'=703MB;4768371582.03:;1907348632.81:;0;95367431640.62 nagsrv$ check_mssql_health --mode database-free --name AdventureWorks \ --warning 700: --critical 200: --units MB WARNING - database AdventureWorks has 694.12MB free space left | 'db_adventureworks_free_pct'=21.31%;0.00:;0.00: 'db_adventureworks_free'=694.12MB;700.00:;200.00:;0;95367431640.62 nagsrv$ check_mssql_health --mode page-life-expectancy OK - page life expectancy is 8950 seconds | page_life_expectancy=8950;300:;180: nagsrv$ check_mssql_health --mode database-backup-age --name AHLE_WORSCHT \ --warning 72 --critical 120 WARNING - AHLE_WORSCHT backupped 102h ago | 'AHLE_WORSCHT_bck_age'=102;72;120 'AHLE_WORSCHT_bck_time'=12

Page 21: Monitorando Serviços Públicos

Using environment variables

You can omit the parameters –hostname, –port (or the alternative –server), –username und –password completely, if you pass the respective data via environment variables. Since version 3.x of Nagios you can add your own attributes to service definittions (custom object variables). They appear as environment variables during the runtime of a plugin.

The environment variables are:

NAGIOS__SERVICEMSSQL_HOST (_mssql_host in the servicedefinition) NAGIOS__SERVICEMSSQL_USER (_mssql_user in the servicedefinition) NAGIOS__SERVICEMSSQL_PASS (_mssql_pass in the servicedefinition) NAGIOS__SERVICEMSSQL_PORT (_mssql_port in the servicedefinition) NAGIOS__SERVICEMSSQL_SERVER (_mssql_server in the servicedefinition)

Installation

This Plugin requires the installation of the Perl-module DBD::Sybase.

After you unpacked the archive you have to execute ./configure aufgerufen. With ./configure –help you get a list of possible options.

–prefix=BASEDIRECTORY The directory where check_mssql_health will be installed (default: /usr/local/nagios)

–with-nagios-user=SOMEUSER The user who owns check_mysql_health sein. (default: nagios)

–with-nagios-group=SOMEGROUP The group which owns check_mysql_health Binaries. (default: nagios)

–with-perl=PATHTOPERL The path to a perl interpreter if you want to use a non-standard one. (default: the perl found in $PATH)

Security advice

The Perl-module DBD::Sybase is based on an installation of FreeTDS auf. This package is responsible for the communication with the database server. The default settings use protocol version 4.x which results in cleartext passwords sent over the wire. Please do change the following parameter in the file /etc/freetds.conf.

[global] # TDS protocol version # tds version = 4.2 tds version = 8.0

Instances

Page 22: Monitorando Serviços Públicos

If multiple named instances are listening on the same port of your database server, you need to register them individually in the file /etc/freetds.conf.

[sourcecode language='xml'] [dbsrv1instance01] host = 192.168.1.19 port = 1433 instance = instance01 [dbsrv1instance02] host = 192.168.1.19 port = 1433 instance = instance02

Now you can address the instances e.g. with –server dbsrv1instance02 . By using –host 192.168.1.19 –port 1433 you would reach the Default instance.

Download

check_mssql_health-1.5.19.1.tar.gz

Monitoramento Oracle

check_oracle_health

Description

check_oracle_health is a plugin to check various parameters of an Oracle database.

Documentation

Command line parameters

–connect= The database name –user= The database user –password= Password of the database user. –connect= Alternativ to the parameters above. –connect=sysdba@ Login with / as sysdba (if the user that executes the plugin is

privileged to do this) –connect=/@token Login with help of the Password Store (assumes –

method=sqlplus)

Page 23: Monitorando Serviços Públicos

–mode= With the mode-parameter you tell the plugin what it should do. See the list of possible values further down.

–tablespace= With this you can limit the check of a single tablespace. If this parameter is omitted all tablespaces are checked.

–datafile= With this you can limit the check of a single datafile. If this parameter is omitted all datafiles are checked.

–name= Here the check can be limited to a single object (Latch, Enqueue, Tablespace, Datafile). If this parameter is omitted all objects are checked. (Instead of –tablespace or –datafile this parameter can and should be used. It servers the purpose to standardize the CLI interface.)

–name2= f you use –mode=sql, then the SQL-Statement appears in the output and performance values. With the parameter name2 you’re able to specify a string for this.

–regexp Through this switch the value of the –name Parameters will be interpreted as regular expression.

–warning= Determined values outside of this range trigger a WARNING. –critical= Determined values outside of this range trigger a CRITICAL. –absolute Without –absolute values that increase in the course of time will show

the increase per second or with –absolute show the difference between the current and last run.

–runas= With this parameter it is possible to run the script under a different user. (Calls sudo internally: sudo -u .

–environment = With this you can pass environment variables to the script. For example: –environment ORACLE_HOME=/u01/oracle. Multiple declarations are possible.

–method= With this parameter you tell the plugin how it should connect to the database. (dbi for using DBD::Oracle (default), sqlplus for using the sqlplus-Tool).

–units=<%|KB|MB|GB> The declaration from units servers the “beautification” of the output from mode=sql and simplification from threshold values when using mode=tablespace-free

–dbthresholds With this parameter thresholds are read from the database table check_oracle_health_thresholds

–statefilesdir This parameter tells the plugin not do use the default directory for temporary files, but a user-specified one. It can be important in a clustered environment with shared filesystems.

Use the option –mode with various keywords to tell the Plugin which values it should determine and check.

Keyword Description Range

tnsping Listener

connection-time Determines how long connection establishment and login take

0..n Seconds (1, 5)

Page 24: Monitorando Serviços Públicos

connected-users The sum of logged in users at the database

0..n (50, 100)

session-usage Percentage of max possible sessions 0%..100% (80, 90)

process-usage Percentage of max possible processes

0%..100% (80, 90)

rman-backup-problems Number of RMAN-errors during the last three days

0..n (1, 2)

sga-data-buffer-hit-ratio Hitrate in the Data Buffer Cache 0%..100% (98:, 95:)

sga-library-cache-gethit-ratio

Hitrate in the Library Cache (Gets) 0%..100% (98:, 95:)

sga-library-cache-pinhit-ratio

Hitrate in the Library Cache (Pins) 0%..100% (98:, 95:)

sga-library-cache-reloads Reload-Rate in the Library Cache n/sec (10,10)

sga-dictionary-cache-hit-ratio

Hitrate in the Dictionary Cache 0%..100% (95:, 90:)

sga-latches-hit-ratio Hitrate of the Latches 0%..100% (98:, 95:)

sga-shared-pool-reloads Reload-Rate in the Shared Pool 0%..100% (1, 10)

sga-shared-pool-free Free Memory in the Shared Pool 0%..100% (10:, 5:)

pga-in-memory-sort-ratio Percentage of sorts in the memory. 0%..100% (99:, 90:)

invalid-objects Sum of faulty Objects, Indices, Partitions

stale-statistics Sum of objects with obsolete optimizer statistics

n (10, 100)

tablespace-usage Used diskspace in the tablespace 0%..100% (90, 98)

tablespace-free Free diskspace in the tablespace 0%..100% (5:, 2:)

tablespace-fragmentation Free Space Fragmentation Index 100..1 (30:, 20:)

tablespace-io-balanc IO-Distribution under the datafiles of a tablespace

n (1.0, 2.0)

tablespace-remaining-time

Sum of remaining days until a tablespace is used by 100%. The rate of increase will be calculated with the values from the last 30 days. (With the parameter –lookback different periods can be specified)

Days (90:, 30:)

tablespace-can-allocate-next

Checks if there is enough free tablespace for the next Extent.

flash-recovery-area-usage Used diskspace in the flash recovery area

0%..100% (90, 98)

Page 25: Monitorando Serviços Públicos

flash-recovery-area-free Free diskspace in the flash recovery area

0%..100% (5:, 2:)

datafile-io-traffic Sum of IO-Operationes from Datafiles per second

n/sec (1000, 5000)

datafiles-existing Percentage of max possible datafiles 0%..100% (80, 90)

soft-parse-ratio Percentage of soft-parse-ratio 0%..100%

switch-interval Interval between RedoLog File Switches

0..n Seconds (600:, 60:)

retry-ratio Retry-Rate in the RedoLog Buffer 0%..100% (1, 10)

redo-io-traffic Redolog IO in MB/sec n/sec (199,200)

roll-header-contention Rollback Segment Header Contention 0%..100% (1, 2)

roll-block-contention Rollback Segment Block Contention 0%..100% (1, 2)

roll-hit-ratio Rollback Segment gets/waits Ratio 0%..100% (99:, 98:)

roll-extends Rollback Segment Extends n, n/sec (1, 100)

roll-wraps Rollback Segment Wraps n, n/sec (1, 100)

seg-top10-logical-reads Sum of the userprocesses under the top 10 logical reads

n (1, 9)

seg-top10-physical-reads Sum of the userprocesses under the top 10 physical reads

n (1, 9)

seg-top10-buffer-busy-waits

Sum of the userprocesses under the top 10 buffer busy waits

n (1, 9)

seg-top10-row-lock-waits Sum of the userprocesses under the top 10 row lock waits

n (1, 9)

event-waits Waits/sec from system events n/sec (10,100)

event-waiting How many percent of the elapsed time has an event spend with waiting

0%..100% (0.1,0.5)

enqueue-contention Enqueue wait/request-Ratio 0%..100% (1, 10)

enqueue-waiting How many percent of the elapsed time since the last run has an Enqueue spend with waiting

0%..100% (0.00033,0.0033)

latch-contention Latch misses/gets-ratio. With –name a Latchname or Latchnumber can be passed over. (See list-latches)

0%..100% (1,2)

latch-waiting How many percent of the elapsed time since the last run has a Latch spend with waiting

0%..100% (0.1,1)

sysstat Changes/sec for any value from v$sysstat

n/sec (10,10)

Page 26: Monitorando Serviços Públicos

sql

Result of any SQL-Statement that returns a number. The statement itself is passed over with the parameter –name. A Label for the performance data output can be passed over with the parameter –name2.

n (1,5)

sql-runtime The time an sql command needs to run

Seconds (1, 5)

list-tablespaces Prints a list of tablespaces

list-datafiles Prints a list of datafiles

list-latches Prints a list with latchnames and latchnumbers

list-enqueues Prints a list with the Enqueue-Names

list-events

Prints a list with the events from (v$system_event). Besides event_number/event_id a shortened form of the eventname is printed out. This could be use as Nagios service descriptions. Example: lo_fi_sw_co = log file switch completion

list-background-events Prints a list with the Background-Events

list-sysstats Prints a list with system-wide statistics

Measurements that are dependent on a time interval can be execute differently. To calculate the end result the following is needed: start value, end value and the passed time between this two values. Without further options the inital value will be the value from the last plugin run. The passed time is normally the time of normal_check_interval of the according service.

If the increase per second shouldn’t be decisive for the check result, but the difference between two measured values, than use the option –absolute. This is useful for Rollback Segment Wraps which happen very rare so that their rate is nearly 0/sec. Nevertheless you want to be alarmed if the number od this events grows.

Page 27: Monitorando Serviços Públicos

The threshold values should be choosen in a way that they can be reached during a retry_check_interval. If not the service will change into the OK-State after each SOFT;1.

Pleae note, that the thresholds must be specified according to the Nagios plug-in development Guidelines.

“10″ means “Alarm, if > 10″ and

“90:” means “Alarm, if < 90″

Preparation of the database

In order to be able to collect the needed information from the database a database user with specific privileges is required:

CREATE USER nagios IDENTIFIED BY oradbmon; GRANT CREATE SESSION TO nagios; GRANT SELECT any dictionary TO nagios; GRANT SELECT ON V_$SYSSTAT TO nagios; GRANT SELECT ON V_$INSTANCE TO nagios; GRANT SELECT ON V_$LOG TO nagios; GRANT SELECT ON SYS.DBA_DATA_FILES TO nagios; GRANT SELECT ON SYS.DBA_FREE_SPACE TO nagios; -- -- if somebody still uses Oracle 8.1.7... GRANT SELECT ON sys.dba_tablespaces TO nagios; GRANT SELECT ON dba_temp_files TO nagios; GRANT SELECT ON sys.v_$Temp_extent_pool TO nagios; GRANT SELECT ON sys.v_$TEMP_SPACE_HEADER TO nagios; GRANT SELECT ON sys.v_$session TO nagios;

Examples

nagios$ check_oracle_health --connect bba --mode tnsping OK - connection established to bba. nagios$ check_oracle_health --mode connection-time OK - 0.17 seconds to connect | connection_time=0.1740;1;5

Page 28: Monitorando Serviços Públicos

nagios$ check_oracle_health --mode sga-data-buffer-hit-ratio CRITICAL - SGA data buffer hit ratio 0.99% | sga_data_buffer_hit_ratio=0.99%;98:;95: nagios$ check_oracle_health --mode sga-library-cache-hit-ratio OK - SGA library cache hit ratio 98.75% | sga_library_cache_hit_ratio=98.75%;98:;95: nagios$ check_oracle_health --mode sga-latches-hit-ratio OK - SGA latches hit ratio 100.00% | sga_latches_hit_ratio=100.00%;98:;95: nagios$ check_oracle_health --mode sga-shared-pool-reloads OK - SGA shared pool reloads 0.28% | sga_shared_pool_reloads=0.28%;1;10 nagios$ check_oracle_health --mode sga-shared-pool-free WARNING - SGA shared pool free 8.91% | sga_shared_pool_free=8.91%;10:;5: nagios$ check_oracle_health --mode pga-in-memory-sort-ratio OK - PGA in-memory sort ratio 100.00% | pga_in_memory_sort_ratio=100.00;99:;90: nagios$ check_oracle_health --mode invalid-objects OK - no invalid objects found | invalid_ind_partitions=0 invalid_indexes=0 invalid_objects=0 unrecoverable_datafiles=0 nagios$ check_oracle_health --mode switch-interval OK - Last redo log file switch interval was 18 minutes | redo_log_file_switch_interval=1090s;600:;60: nagios$ check_oracle_health --mode switch-interval --connect rac1 OK - Last redo log file switch interval was 32 minutes (thread 1)| redo_log_file_switch_interval=1938s;600:;60: nagios$ check_oracle_health --mode tablespace-usage CRITICAL - tbs SYSTEM usage is 99.33% tbs SYSAUX usage is 93.73% tbs USERS usage is 8.75% tbs UNDOTBS1 usage is 6.65% | 'tbs_users_usage_pct'=8%;90;98 'tbs_users_usage'=0MB;4;4;0;5 'tbs_undotbs1_usage_pct'=6%;90;98

Page 29: Monitorando Serviços Públicos

'tbs_undotbs1_usage'=11MB;153;166;0;170 'tbs_system_usage_pct'=99%;90;98 'tbs_system_usage'=695MB;630;686;0;700 'tbs_sysaux_usage_pct'=93%;90;98 'tbs_sysaux_usage'=802MB;770;839;0;856 nagios$ check_oracle_health --mode tablespace-usage --tablespace USERS OK - tbs USERS usage is 8.75% | 'tbs_users_usage_pct'=8%;90;98 'tbs_users_usage'=0MB;4;4;0;5 nagios$ check_oracle_health --mode tablespace-usage --name USERS OK - tbs USERS usage is 8.75% | 'tbs_users_usage_pct'=8%;90;98 'tbs_users_usage'=0MB;4;4;0;5 nagios$ check_oracle_health --mode tablespace-free --name TEST OK - tbs TEST has 97.91% free space left | 'tbs_test_free_pct'=97.91%;5:;2: 'tbs_test_free'=32083MB;1638.40:;655.36:;0.00;32767.98 nagios$ check_oracle_health --mode tablespace-free --name TEST --units MB --warning 100: --critical 50: OK - tbs TEST has 32083.61MB free space left | 'tbs_test_free_pct'=97.91%;0.31:;0.15: 'tbs_test_free'=32083.61MB;100.00:;50.00:;0;32767.98 nagios$ check_oracle_health --mode tablespace-free --name TEST --warning 10: --critical 5: OK - tbs TEST has 97.91% free space left | 'tbs_test_free_pct'=97.91%;10:;5: 'tbs_test_free'=32083MB;3276.80:;1638.40:;0.00;32767.98 nagios$ check_oracle_health --mode tablespace-remaining-time --tablespace ARUSERS --lookback 7 WARNING - tablespace ARUSERS will be full in 78 days | 'tbs_arusers_days_until_full'=78;90:;30: nagios$ check_oracle_health --mode flash-recovery-area-free OK - flra /u00/app/oracle/flash_recovery_area has 100.00% free space left | 'flra_free_pct'=100.00%;5:;2:

Page 30: Monitorando Serviços Públicos

'flra_free'=2048MB;102.40:;40.96:;0;2048.00 nagios$ check_oracle_health --mode flash-recovery-area-free --units KB --warning 1000: --critical 500: OK - flra /u00/app/oracle/flash_recovery_area has 2097152.00KB free space left | 'flra_free_pct'=100.00%;0.05:;0.02: 'flra_free'=2097152.00KB;1000.00:;500.00:;0;2097152.00 nagios$ check_oracle_health --mode datafile-io-traffic --datafile users01.dbf WARNING - users01.dbf: 1049.83 IO Operations per Second | 'dbf_users01.dbf_io_total_per_sec'=1049.83;1000;5000 nagios$ check_oracle_health --mode latch-contention --name 214 OK - SGA latch library cache (214) contention 0.08% | 'latch_214_contention'=0.08%;1;2 'latch_214_sleep_share'=0.00% 'latch_214_gets'=49995 nagios$ check_oracle_health --mode latch-contention --name 'library cache' OK - SGA latch library cache (214) contention 0.08% | 'latch_214_contention'=0.08%;1;2 'latch_214_sleep_share'=0.00% 'latch_214_gets'=49937 nagios$ check_oracle_health --mode enqueue-contention --name TC CRITICAL - enqueue TC: 19.90% of the requests must wait | 'TC_contention'=19.90%;1;10 'TC_requests'=2015 'TC_waits'=401 nagios$ check_oracle_health --mode latch-contention --name 'messages' OK - SGA latch messages (17) contention 0.02% | 'latch_17_contention'=0.02%;1;2 'latch_17_gets'=4867 nagios$ check_oracle_health --mode latch-waiting --name 'user lock' OK - SGA latch user lock (205) sleeping 0.000841% of the time | 'latch_205_sleep_share'=0.000841% nagios$ check_oracle_health --mode event-waits --name 'log file sync' OK - log file sync : 1.839511 waits/sec | 'log file sync_waits_per_sec'=1.839511;10;100

Page 31: Monitorando Serviços Públicos

nagios$ check_oracle_health --mode event-waiting --name 'Log file parallel write' OK - log file parallel write waits 0.045843% of the time | rarr 'log file parallel write_percent_waited'=0.045843%;0.1;0.5 nagios$ check_oracle_health --mode sysstat --name 'transaction rollbacks' OK - 0.000003 transaction rollbacks/sec | 'transaction rollbacks_per_sec'=0.000003;10;100 'transaction rollbacks'=4 nagios$ check_oracle_health --mode sql --name 'select count(*) from v$session' --name2 sessions CRITICAL - sessions: 21 | 'sessions'=21;1;5 nagios$ check_oracle_health --mode sql --name 'select 12 from dual' --name2 twelve --units MB CRITICAL - twelfe: 12MB | 'twelfe'=12MB;1;5 nagios$ check_oracle_health --mode sql --name 'select 200,300,1000 from dual' --name2 'kaspar melchior balthasar' --warning 180 --critical 500 WARNING - kaspar melchior balthasar: 200 300 1000 | 'kaspar'=200;180;500 'melchior'=300;; 'balthasar'=1000;; nagios$ check_oracle_health --mode sql --name "select 'abc123' from dual" --name2 \\d --regexp OK - output abc123 matches pattern \d

Authentication

Example with –runas and an “external user”

There are to users in the database:

OPS$DBNAGIO IDENTIFIED EXTERNALLY NAGIOS IDENTIFIED BY ‘DBMONI’

There are two unix users:

Page 32: Monitorando Serviços Públicos

qqnagio with normal access. dbnagio with /bin/false as login shell.

qqnagio$ check_oracle_health --mode=connection-time --connect=nagios/dbmoni@BBA OK - 0.21 seconds to connect as NAGIOS dbnagio$ check_oracle_health --mode=connection-time --connect=BBA --runas=dbnagio --environment ORACLE_HOME=$ORACLE_HOME OK - 0.17 seconds to connect as OPS$DBNAGIO

The background for this example is the following scenario with a SAP-Server:

Only local connections to the database are allowed. The database isn’t reachable over the network. Logging in with username and password is not possible.

Only database-users that are authenticated through the operating system (OPS$-User) are allowed to connect.

These users are not allowed to connect via SSH. (Therefore /bin/false).

Because the Nagios user qqnagio is allowed to connect via SSH, he can’t be used as database user. But the NRPE which executes the plugin will run under the qqnagios-account.

Use of environment variables

It is possible to omit –connect (and if not needed –user and –password) completely, if you provide the corresponding values in environment variables. Since Version 3.x it is possible to extend service definitions in Nagios through own attributes (custom object variables). These will appear during the exectution of the check command in the environment.

The environment variables are:

NAGIOS__SERVICEORACLE_SID (_oracle_sid in the service definition) NAGIOS__SERVICEORACLE_USER (_oracle_user in the service definition) NAGIOS__SERVICEORACLE_PASS (_oracle_pass in the service definition)

Page 33: Monitorando Serviços Públicos

Installation

The installation of the perl-modules DBI and DBD::Oracle is required.

After unpacking the archive ./configure is called. With ./configure –help some options can be printed which show some default values for compiling the plugin.

–prefix=BASEDIRECTORY Specify a directory in which check_oracle_health should be stored. (default: /usr/local/nagios)

–with-nagios-user=SOMEUSER This User will be the owner of the check_oracle_health file. (default: nagios)

–with-nagios-group=SOMEGROUP The group of the check_oracle_health plugin. (default: nagios)

–with-perl=PATHTOPERL Specify the path to the perl interpreter you wish to use. (default: perl in PATH)

Download

check_oracle_health-1.7.8.1.tar.gz

Some versions of tar are having problems with the long filesnames. In this case please unpack the shar-Paket with cat check_oracle_health-xxx.shar.gz | gzip -d | sh