73
TALLINN UNIVERSITY OF TECHNOLOGY Faculty of Information Technology Department of Computer Science Chair of Network Software CHOOSING AN OPEN-SOURCE LOG MANAGEMENT SYSTEM FOR SMALL BUSINESS Master’s Thesis ITI70LT Student: Artyom Churilin Student code: 113832IVCMM Advisor: Risto Vaarandi, Ph.D Tallinn, 2013

Choosing an open source log management system for small business

Embed Size (px)

DESCRIPTION

CHOOSING AN OPEN-SOURCE LOG MANAGEMENT SYSTEM FOR SMALL BUSINESS

Citation preview

Page 1: Choosing an open source log management system for small business

TALLINN UNIVERSITY OF TECHNOLOGY Faculty of Information Technology Department of Computer Science

Chair of Network Software

CHOOSING AN OPEN-SOURCE LOG MANAGEMENT SYSTEM FOR SMALL

BUSINESS

Master’s Thesis

ITI70LT

Student: Artyom Churilin Student code: 113832IVCMM Advisor: Risto Vaarandi, Ph.D

Tallinn, 2013

Page 2: Choosing an open source log management system for small business

2

Declaration

I hereby declare that I am the sole author of this thesis. The work is original and has not been

submitted for any degree or diploma at any other University. I further declare that the material

obtained from other sources has been duly acknowledged in the thesis.

……………………………………. ………………………………

(date) (signature)

Page 3: Choosing an open source log management system for small business

3

List of Acronyms and Abbreviations

AMQP Advanced Message Queuing Protocol

APT Advanced Persistent Threat

CERT Computer Emergency Response Team

CIRT Critical Incident Response Team

CPU Central Processing Unit

DNS Domain Name System. Often used to refer to a DNS server

ELSA Enterprise Log Search and Archive. Open-source log management

system created by Martin Holste – former Security Incident

Response Team Lead specializing in network security monitoring

and open-source tools

FIFO First In First Out, in this paper used as named or unnamed pipe. A

pipe is a mechanism for inter-process communication; data written

to the pipe by one process can be read by another process

GELF Graylog Extended Log Format

GNU A recursive acronym for GNU's Not Unix GNU GPL GNU General Public License, widely used license for free software

GUI Graphic User Interface

LDAP Lightweight Directory Access Protocol

AD Microsoft Active Directory

PCAP Packet Capture. Application programming interface for capturing

network traffic.

PRI Priority field in Syslog message

RFC Request for Comments

RPM A package management system for many Linux distributions.

Page 4: Choosing an open source log management system for small business

4

sendmail mail server application used on Unix platforms

TCP Transmission Control Protocol.

URL Uniform Resource Locator. Sometimes referred to as a “web link”

UDP User Datagram Protocol.

VHD File format supported by many virtual platforms. Virtual Hard Disk

Page 5: Choosing an open source log management system for small business

5

Abstract

This thesis focuses on comparison of three popular open-source log management systems. The

purpose of this thesis is to give overview of three popular log management systems and provide

guidelines for choosing the best suiting one for a small company.

The choice was based on the comparative analysis as well as performance and usability testing.

ELSA is a high performance open-source log management system that can challenge enterprise

grade commercial solutions. It was designed for effective incident response and fighting against

APT (Advanced Persistent Threat).

Kibana is log analysis front end for Logstash and Elasticsearch. It can also be used with other

back ends that support formatted output into Elasticsearch, such as Rsyslog with oemelasticsearch

module.

Graylog2 is an alternative log management tool with its own web GUI. Speciality of Graylog2 is

that logs can be easily divided into different streams to give access to specific type of logs to

different users.

Performance testing showed that ELSA is the fastest and can handle in average 14285,7 logs per

second with the modest hardware resources used for testing. As the solution is meant for small

business, performance is not a crucial factor so Graylog2 and Kibana could very well compete

with ELSA in the given conditions.

According to usability test results Kibana is the most usable system.

Kibana with Rsyslog was chosen as the best fitting solution for a small company. It has some

shortcomings with authentication and saved searches, but the usability, ease of installation and

universality makes it such an outstanding solution for small business. The lacking functions are

under development, meanwhile there is possibility to use external mechanisms and workarounds.

Page 6: Choosing an open source log management system for small business

6

Table of Contents

List of Figures .................................................................................................................................. 9

List of Tables .................................................................................................................................. 10

1. Introduction ................................................................................................................................ 11

1.1. Event logs ............................................................................................................................ 11

1.2. Central log management ...................................................................................................... 11

1.3. Purpose of the thesis ............................................................................................................ 12

1.4. Outline of the thesis ............................................................................................................. 12

2. Log collection ............................................................................................................................. 13

2.1. Logging protocols ................................................................................................................ 13

2.1.1. BSD Syslog protocol .................................................................................................... 13

2.1.2. IETF Syslog protocol .................................................................................................... 14

2.2. Non GUI logging solutions.................................................................................................. 14

2.2.1. Unix Syslogd software suite ......................................................................................... 15

2.2.2. Syslog-ng framework .................................................................................................... 15

2.2.3. Rsyslog software suite .................................................................................................. 15

2.3. Graphical log management solutions .................................................................................. 17

2.3.1. Graylog2 ....................................................................................................................... 18

2.3.2. Kibana ........................................................................................................................... 18

2.3.3. ELSA ............................................................................................................................ 19

3. Comparative analysis ................................................................................................................. 20

3.1. Structure .............................................................................................................................. 21

3.1.1. Graylog2 structure ........................................................................................................ 21

3.1.2. Kibana structure ............................................................................................................ 21

3.1.3. ELSA structure ............................................................................................................. 22

3.2. Input and output ................................................................................................................... 23

3.2.1. Graylog2 input and output ............................................................................................ 23

3.2.2. Kibana input and output ................................................................................................ 24

Page 7: Choosing an open source log management system for small business

7

3.2.3. ELSA input and output ................................................................................................. 26

3.3. Interface ............................................................................................................................... 26

3.3.1. Graylog2 interface ........................................................................................................ 26

3.3.2. Kibana interface ............................................................................................................ 27

3.3.3. ELSA interface ............................................................................................................. 28

3.4. Features ................................................................................................................................ 29

3.4.1. Graylog2 features .......................................................................................................... 29

3.4.2. Kibana features ............................................................................................................. 30

3.4.3. ELSA features ............................................................................................................... 31

3.5. Search .................................................................................................................................. 31

3.5.1. Graylog2 search ............................................................................................................ 31

3.5.2. Kibana search ................................................................................................................ 31

3.5.3. ELSA search ................................................................................................................. 31

3.6. Conclusion based on comparative analysis ......................................................................... 32

4. Choosing a log management solution......................................................................................... 34

4.1. Logging requirements for small business ............................................................................ 34

4.2. Testing ................................................................................................................................. 35

4.2.1. Testing environment ..................................................................................................... 35

4.2.2. Performance testing ...................................................................................................... 39

4.2.3. Usability testing ............................................................................................................ 46

5. Implementation ........................................................................................................................... 52

5.1. Production environment ...................................................................................................... 52

5.2. Implementation of Kibana in production ............................................................................. 52

6. Future research ........................................................................................................................... 54

7. Summary .................................................................................................................................... 55

Resüme ........................................................................................................................................... 56

List of References ........................................................................................................................... 57

Appendices ..................................................................................................................................... 59

Page 8: Choosing an open source log management system for small business

8

Appendix - 1 Basic Event Log Cycle ......................................................................................... 59

Appendix 2 - Logstash Inputs, Filters and Outputs .................................................................... 60

Appendix 3 - Rsyslog main components installation ................................................................. 61

Appendix 4 - Kibana setup example scheme .............................................................................. 62

Appendix 5 -TCP and UDP output options in Logstash ............................................................ 63

Appendix 6 – Graylog2 setup example scheme ......................................................................... 65

Appendix 7 – Graylog2 tweaked settings ................................................................................... 66

Appendix 8 – Graylog2, Kibana and ELSA component details ................................................. 67

Appendix 9 – Lucene search ...................................................................................................... 68

Appendix 10 – Kibana search examples ..................................................................................... 70

Appendix 11 – ELSA search examples ...................................................................................... 72

Appendix 12 – ELSA performance test details .......................................................................... 73

Page 9: Choosing an open source log management system for small business

9

List of Figures

Figure 1 Log management solution model ..................................................................................... 17

Figure 2 Graylog2 software components ....................................................................................... 36

Figure 3 Kibana main components ................................................................................................. 37

Figure 4 ELSA main components .................................................................................................. 38

Figure 5 Performance test results statistics compared .................................................................... 40

Figure 6 Relative increase in performance with 4 cores ................................................................ 41

Figure 7 Graylog2 of performance test results logs/sec ................................................................. 42

Figure 8 Kibana and Logstash performance test results logs/sec ................................................... 43

Figure 9 Kibana and Rsyslog performance test results logs/sec .................................................... 44

Figure 10 ELSA performance test results logs/sec......................................................................... 45

Figure 11 Scheme of Kibana implementation ................................................................................ 53

Page 10: Choosing an open source log management system for small business

10

List of Tables

Table 1 Basic overview of the log management solution ............................................................... 20

Table 2 Advantages and disadvantages of log management solutions ........................................... 32

Table 3 Usability test score ............................................................................................................ 46

Page 11: Choosing an open source log management system for small business

11

1. Introduction

Today’s computer networks are very complex. Operating systems have millions of lines of code,

amounts of data and data transfer rates are continuously growing to meet the demands of the

market. Even relatively small networks can have millions of events per second. They vary in

importance and are often interconnected.

What are these events, how can they be managed and how to get useful information from these

events? What do current popular solutions offer and how to choose one for a small company?

There are various commercial log management solutions available on the market. These solutions

are quite expensive and are hardly affordable by small companies. Fortunately there are open-

source log management tools, which are free of charge and the only cost is the hardware or

hardware resources on a virtual platform. As small companies normally have only few technicians

it is important that the solution is easy to install, maintain and use. Performance requirements for

small business are normally moderate, but it depends on the specific environment.

1.1. Event logs

Event can be defined as “a relevant change in state” of a system [1], alternatively - an “action or

occurrence detected by a program”. An example could be: a network packet arrived to switch or a

firewall, user ran an executable, a network link went down, a user browsing a website received

error code 404 because of a broken URL etc. IT systems handling those events generate event

messages and usually by default store them locally or, if configured specifically, send to a remote

location. When these messages are recorded they are referred to as event logs or simply logs.

There are several standards and formats of log messages, but in general all logs consist of two

main parts. First is the timestamp, stating the date and time the event happened. Second part is the

data, containing information about the event itself. Logs can have more distinctive parts like

facility (type of software that generated the event), source IP, severity (e.g. error, info, debug) etc.

A typical event log cycle is presented by a diagram in Appendix 1 at the end of the thesis.

1.2. Central log management

Many server and client operating systems, network switches, routers, firewalls, printers, even

VoIP phones have capability to produce logs and send them through the network. Depending on

the size and complexity of the IT infrastructure there could be tens, thousands or possibly millions

of events per second. These events vary in importance and urgency but all of them are required to

get the full picture of what is going on in the network and inside the nodes’ operating systems.

By default logs are stored locally. This setup has many drawbacks. Firstly it is not efficient as

each device has to be managed separately. Secondly the logs stored locally can be deleted or

Page 12: Choosing an open source log management system for small business

12

changed. If an attacker or malware managed to infiltrate a network device or a server, logs

including the records about the security breach could be changed or deleted. In this case the attack

would not be even noticed. Thirdly, if a device memory is corrupted or hardware fails, then the

local logs might not be accessible at all. In this case it might not be possible to find out the reason

of this malfunction. Central log management and event alert system can help solve these issues.

It is of crucial importance for an IT department of any organisation to be able to efficiently track

any event in the network within a needed timeframe. One logical solution to this issue is to send

all logs into a central log server. Modern log protocols support encryption and authentication to

secure the log collection. Software development, website administration, network administration,

incident response these are some example activities that require efficient log management.

1.3. Purpose of the thesis

The purpose of this thesis is to give guidelines for choosing a solution for small business and

choose the best suiting open-source log management system for a small target company. The

choice is based on a comparative analysis as well as performance and usability testing.

1.4. Outline of the thesis

Chapter one of the thesis states the problem of the research. Main standards and protocol suites is

described in chapter two.

Comparative analysis based on the features, performance and usability testing is presented in

chapter three.

Chapter four describes the performance and usability testing and presents the results.

Implementation plan of a chosen log management system in a small company is described in

chapter five.

Chapter six offers some ideas for future research.

Chapter seven summarizes the thesis.

Page 13: Choosing an open source log management system for small business

13

2. Log collection

Event logs can be generated by most of the applications, operating systems and network devices.

Logs can be used for incident investigation, historical reporting, debugging etc. Because event

logs are produced in real-time – they can also be used for real-time monitoring systems. Often

such monitoring solutions include a frontend with analytical module and dashboards that show the

current status as well as the historical. Usually such solutions have capability to send notifications

for specific events (e.g. in a form of email alerts).

2.1. Logging protocols

There are several main standards and protocol suites that are currently used in applications,

operating systems and network devices. New standards were introduced to address the

shortcomings of their predecessors.

2.1.1. BSD Syslog protocol

BSD Syslog Was developed in 1980s by Eric Allman for sendmail application as an alternative

for appending messages to flat files from programs. According to RFC3164, the sender sends a

syslog message with maximum size of 1KB to the receiver over the UDP protocol; destination

port 514 is used and source port 514 is recommended.

Syslog message is sent with a UDP packet which has following payload:

<PRI>Timestamp Hostname Content

The formula for calculating PRI:

PRI = 8*Facility + Severity

Facility defines the software component that generated the event. Here are the facility values used

for calculating PRI: kern (0), user (1), mail (2), daemon (3), local0..7 (16..23)

Severity defines the level of relative event importance. Here are the severity values used for

calculating PRI: emerg (0), alert (1), crit (2), error (3), warning (4), notice (5), info (6), debug (7).

Timestamp has the next syntax: “MMM DD hh:mm:ss”. Hostname part contains the sender

hostname or IP address. First 32 alphanumeric characters in the content field are regarded as tag

field (name of the logging program), and the rest is regarded as message field [3].

One of the drawbacks of the BSD Syslog protocol is that it uses UDP only. This means there is no

delivery control as no acknowledgement of the receipt is made [4]. Another limitation is that BSD

syslog does not support encryption, so messages are sent in clear text. It also does not support

authentication. Timestamps have no time zone information and time is given in seconds. UTF

encoded characters are not supported. These shortcomings were addressed by IETF syslog

protocol (Chapter 2.1.2).

Page 14: Choosing an open source log management system for small business

14

2.1.2. IETF Syslog protocol

IETF Syslog protocol is defined by RFCs 5424-5426. It supports TLS and default port for

message reception is 6514/tcp. Both the message sender and receiver must support certificate

based authentication. However, the administrator chooses the authentication options. Messages

are sent as TLS application data which consists of one or more syslog frames.

RFC 5426 sets requirements for message transmission over UDP: default message reception at

514/udp, a message is sent as a single UDP packet. IETF syslog messages are more structured

than the ones of BSD syslog. Here is the structure of IETF messages:

<PRI>Version Timestamp Hostname Application PID MsgID StructData Message

To sum up: IETF syslog protocol is a more structured, transmission-reliable and secure than BSD

syslog [3].

2.2. Non GUI logging solutions

Since 1980 when the BSD syslog protocol was created, there have been some important

developments in syslog based solutions. Here are some important events that have formed today’s

non-GUI open-source syslog market:

• 1998 Balabit releases Syslog-ng

• 2004 Rsyslog is released

• 2007 Syslog-ng announces Syslog-ng PE (premium edition)

At the same time that Syslog-ng went partially commercial in 2007 by introducing the PE version,

Rsyslog got to the same level with its features. On February 28th Rsyslog 3.12.0 was released.

According to Rainer Gerhards, from that date on Rsyslog supported all Syslog-ng major features,

but had a number of major features exclusive to it. Rainer Gerhards considered Rsyslog 3.12.0

fully superior to Syslog-ng at the same period of time with exception of platform support [5].”

Syslog-PE has some additional advanced features like encrypted log storage, Microsoft Windows

support and client-side failover [6]. According to the popularity, community support and online

discussions Syslog-ng OSE (open-source edition) and Rsyslog are the most widely used open-

source non-GUI syslog solutions.

Page 15: Choosing an open source log management system for small business

15

2.2.1. Unix Syslogd software suite

UNIX syslogd (syslog daemon) can receive messages from a local file system socket and UDP

port 514, and send output to local files or remote syslogd instance. Syslogd configuration is

usually stored in /etc/syslog.conf that contains single-line rules. Each rule consists of selector and

action, where selector is a list of “facility.severity” pairs and action specifies a destination for the

message. Facility can be set to the standard syslog facility classifiers alternatively it can have “*”,

which means any facility. Severity can be set to the standard syslog severity classifiers or “none”.

Flat files, FIFOs, terminals and remote log servers are usually supported as destinations. This

suite is still used for simple solutions, but generally it has been by more functional software suites

like Syslog-ng and Rsyslog.

2.2.2. Syslog-ng framework

Syslog-ng is one of the most prominent syslog frameworks with a very large user base.

Supports logging both over UDP and TCP In addition to BSD syslog protocol, also supports IETF

syslog protocol including encryption and authentication. Syslog-ng employs regular expressions

for matching and filtering messages by tag, message text, etc. It supports custom message

templates and allows user to change the log message format and the set of message fields that are

logged [7].

2.2.3. Rsyslog software suite

Rsyslog is an advanced open-source logging solution. Letter R in the name stands for reliable,

which mainly emphasises the use of TCP as transport and does not point to the unreliability of

predecessors. Rsyslog can be used under terms of GPLv3 license but can be used for a non-

GPLv3 compatible project in some special cases described in the license agreement [8].

Rsyslog has been developed in 2004 based on the sysklogd (syslog and klogd – latter handles

kernel messages) standard package. The goal of the Rsyslog project is to provide a feature-richer

and reliable syslog daemon while retaining drop-in replacement capabilities to stock syslogd [5].

It adds a lot of features to Unix syslogd, including support for IETF syslog protocol along with

other features has advanced message filtering and custom message formatting.

Rsyslog configuration is usually stored in /etc/rsyslog.conf It supports traditional selector-action

rules of Unix syslogd, in order to ease migration from syslogd. Rsyslog has become the default

logging solution for many Linux distributions.

According to Rainer Gerhards, the main author of the Rsyslog, the main competitor for Rsyslog is

Syslog-ng. Rsyslog’s advantage is that it is free of charge including all features, but full-featured

Syslog-ng PE (premium edition) has a paid license.

Page 16: Choosing an open source log management system for small business

16

Rsyslog maintains backward compatibility with syslogd: basic syslog.conf format is extremely

well known, covered in a lot of text books, taught in numerous courses and used in a myriad of

Internet tutorials. So if we would abandon it, we would thrash a lot of people's knowledge and

help resources [9].

Page 17: Choosing an open source log management system for small business

17

2.3. Graphical log management solutions

Whatever solution is used for the backed of the log collection it important to have the logs

presented in a comprehensive and useful manner. Aim of a Graphical User Interface (GUI) is to

give quick and easy access to an IT system. User management, system configuration, graphs with

historical data, dashboards with real time statistics – these are some of the main useful features

that are available in a good GUI frontend. Productivity and user experience of an operator of such

GUI depends on how flexible, customisable and usable these options are. There is no perfect GUI

for all cases it is more a question of what suits best to the given environment. Open-source

graphical log management solutions are quite flexible and can be used in combination with other

non-GUI solutions. Following main components of a graphical log management solution could be

outlined (see Figure 1):

• log shipper

• log parser

• log storage, indexing and search

• web interface

Figure 1 Log management solution model

Most of the components, depending on the solution, could be replaced by some alternative ones.

log

shipper

storage

search

index

web interface

logs

logs

log parser logs

Page 18: Choosing an open source log management system for small business

18

Log shipper can normally be any log collection software like a syslog daemon. It serves as the

entry point for event logs from local services or network and applies some action to the logs. In

log management system it normally sends the logs for further parsing and filtering. (Syslogd,

Rsyslog, Syslog-ng, Logstash, Graylog2 etc.)

Log parser is a separate process or module which is responsible for parsing fields out from raw

log messages and creating structured messages which are suitable for writing into log storage.

(Grok, Json, Ruby, Syslog4j etc.) Log storage, indexing and search are performed using databases

and indexing software. (MySQL, MongoDB, Tokyo Cabinet, Elasticsearch, Sphinx search)

Web interface works as a frontend to all of the components and provides means to manage the log

data. (Log analyser, Kibana, Graylog2 web interface, Elsa web interface etc.)

Distribution of the functions among components depends on the architecture of the solution.

Multiple functions can be executed by a single part of the log management solution e.g. Graylog2

server is log shipper and parser. Logstash is a log shipper, indexer and has its own integrated web

interface. Kibana is a front end web interface and indexer. In many cases the parsing and storing

functionality is implemented inside the log shipper.

Single function can be divided among multiple components e.g. Graylog2 storage is done by

Elasticsearch (messages) and MongoDB (statistics, user accounts) [10]. Log management

solutions might include other components like various plugins, filters and middleware. This will

be described in more detail in chapter 3.1 of the thesis.

2.3.1. Graylog2

Graylog2 is an open-source GPLv3 licensed log management system that stores logs in

Elasticsearch. It was designed by Lennart Koopmann, developer at XING AG, and was released

in 2010. It consists of a server written in Java that accepts syslog messages via TCP or UDP and

stores them in indexes of Elasticsearch. The second part is a Ruby on Rails web interface.

Graylog2 web interface allows searching through the logs, apply filters, blacklist strings, quickly

view logs for each monitored host and flexibly manage access to the logs by authorising users to

see specific log “streams”.Main configuration file is graylog2.conf. Embedded Elasticsearch

configuration file is graylog2-elasticsearch.yml. Elasticsearch - elasticsearch.yml

2.3.2. Kibana

Kibana is a browser based frontend for Logstash and Elasticsearch written in Java Script and

Ruby. It was designed and developed in 2012 by Rashid Khan, developer at Elasticsearch project.

Its default log shipper Logstash is a flexible open-source log management software supporting a

Page 19: Choosing an open source log management system for small business

19

long list of inputs, filters and outputs. As an alternative to Logstash, Kibana can be configured to

work with other log management software which supports output to Elasticsearch. The setup

examples described further in the thesis are Kibana with Logstash and Kibana with Rsyslog.

Main configuration file for Kibana is Kibana.Config.rb, for Elasticsearch - elasticsearch.yml, for

Logstash - logstash.conf and for Rsyslog – rsyslog.conf.

2.3.3. ELSA

ELSA stands for Enterprise Logs Search and Archive. It is an open-source log management

solution written in C. ELSA was created by Michael Holste - former Security Incident Response

Team leader, currently employed at Mandiant (company offering information security services).

Its author describes the program in short as: GPLv2 framework around Syslog-ng, MySQL, and

Sphinx search. [11]

Perl is used as a pipe between the components e.g. logs are taken from Syslog-ng output and

prepared for batch loading into MySQL.

ELSA was designed to support efficient network incident response. It is oriented on high

performance and is advertised to handle more than a million of logs per minute and give a billion

results for a query in half a second on modest hardware [12].

ELSA has two main installations: node and web. ELSA nodes that only gather, store and forward

the logs need only node component installed. Nodes that are used as a gateway for queries need

the ELSA web component installation. In small setup scenarios, like the one used for testing, both

components are installed on the same node. Main configuration files are elsa_node.conf and

elsa_web.conf.

Page 20: Choosing an open source log management system for small business

20

3. Comparative analysis

Comparative analysis is based on the primary data generated during the tests and secondary data

from the web resources. Here is the basic overview of the solutions presented in the table

presented in Table 1.

Name: Graylog2 Kibana ELSA

Language Java, Java Script, Ruby Java, Java Script, Ruby C, Perl

Protocols

BSD & IETF syslog, GELF,

GELF via http, AMQP

BSD & IETF syslog,

AMQP, XMPP… BSD & IETF syslog

Transport TCP, UDP TCP, UDP TCP, UDP

Log shipper Graylog2 Logstash, Rsyslog Syslog-ng

Log parser syslog4j

grok, json, syslog4j… 28

filters perl, PatternDB

Storage Elasticsearch, MongoDB Elasticsearch MySQL

Indexing Elasticsearch Elasticsearch Sphinx search

License GNU GPLv3 Apache 2.0 GNU GPL v2

Documentation

Good: platform independent

instructions, official

examples for Debian,

unofficial for RHEL

Good: platform

independent

instructions, official

examples for Debian,

unofficial for RHEL Excellent

Installation scripts

Script available for Debian

based

Script available for

Debian based

Multiplatform fully

auto

Demo

http://public-

graylog2.taulia.com/session

http://demo.kibana.org

/#/dashboard no live demo

Authentication Local, LDAP

Needs external

authentication e.g. with

passenger module in

Apache or Ngnix none, local or LDAP

Authorisation Local, LDAP

Under development,

passenger can be used

Account or group

based, local or LDAP

Performance on

modest hardware

suitable for

Small and medium sized

business

Medium sized business

and enterprise Enterprise

Log lines /second

announced thousands per second thousands per second

tens of thousands

per second

Log lines /second

tested 1428,6 5681,82 14285,7

Saved Searches Streams No Yes

Search syntax Lucene + regular expressions Apache Lucene search Google syntax

Event triggering

and alerts

Regular expressions

templates + email alerts

No native alerts or

event triggering, done

on log shipper side

Scheduled searches,

actions + alerts

Table 1 Basic overview of the log management solution

Page 21: Choosing an open source log management system for small business

21

3.1. Structure

3.1.1. Graylog2 structure

Graylog2 consist of two main components a server written in Java and a web interface written in

Ruby using Ruby on Rails web framework. The Graylog2 server listens to log messages, receives,

parses, does the indexing and stores messages in Elasticsearch and statistical data, graphs and user

accounts in the MongoDB. For an overview of how Graylog2 can be implemented please see the

scheme in Appendix 6.

Elasticsearch is a highly scalable, resilient, schema free, document oriented non-sequel open-

source database. It is an Apache 2 licensed open-source distributed search engine, built on top of

Apache Lucene [13]. Elasticsearch is used for Graylog2 and Kibana.

3.1.2. Kibana structure

Kibana is a web interface written in Java-Script and Ruby using Sinatra web framework [14].

Typical minimal deployment of Kibana consists of Logstash and Kibana. Logstash is used for

receiving log messages from various sources, optionally filtering the logs and sending them

though one of the supported outputs. A simple example would be Logstash listening for IETF

syslog formatted messages on TCP and UDP ports 514 and without applying additional filters

forwards the logs to Elasticsearch.

Logstash inputs, filters and outputs will be described in more detail in chapter 3.2.2 of the thesis.

As an alternative setup for Kibana, Logstash can be replaced with Rsyslog for sending specifically

parsed logs to Kibana via Elasticsearch.

Rsyslog sends formatted logs into Elasticsearch using omelasticsearch module. These logs are

parsed using Json and indexed in a format suitable for.

Page 22: Choosing an open source log management system for small business

22

Here are the lines in rsyslog.conf file that are required for this setup:

module(load="omelasticsearch")

$template Syslog2Kibana, "{\"@timestamp\":\"%timereported:::date-

rfc3339%\",\"@message\":\"%rawmsg:::json%\",\"@type\":\"syslog\",\"@tags\":

[],\"@fields\":{\"receptiontime\":\"%timegenerated:::date-

rfc3339%\",\"host\":\"%HOSTNAME:::json%\",\"tag\":\"%syslogtag:::json%\",\"

facility\":\"%syslogfacility-text%\",\"severity\":\"%syslogseverity-

text%\",\"msgtext\":\"%msg:::json%\"}}"

$template SyslogIndex, "rsyslog-%timereported:1:10:date-rfc3339%"

*.* action(type="omelasticsearch"

template="Syslog2Kibana"

dynSearchIndex="on"

searchIndex="SyslogIndex"

server="localhost"

serverport="9200"

bulkmode="on" )

The first line enables omelasticsearch - output module to Elasticsearch. Next line defines the

pattern for structuring the message and timestamp which was given the name “Syslog2Kibana”.

Second template is for the search index, which was given the name “SylogIndex”. There is a

special setting in KibanaConfig.rb file that needs to be set for Kibana to Index the logs coming

from Rsyslog. The settings are presented below:

Smart_index = true

#Smart_index_pattern = 'logstash-%Y.%m.%d'

Smart_index_pattern = 'rsyslog-%Y-%m-%d'

These lines in KibanaConfig.rb enable the smart index feature and replace Logstash pattern with

Rsyslog pattern to allow Kibana index Rsyslog data from Elasticsearch. For an overview of how

Kibana can be implemented please see the scheme in Appendix 4.

3.1.3. ELSA structure

ELSA uses Syslog-ng for receiving logs, its PatternDB for parsing, which is claimed by the

designer to be more efficient than using the computationally intensive regular expressions.

Alternative input is via HTTP, which is used for communicating between nodes in a cluster.

Parsed logs are written into a raw file and are then batch loaded into the MySQL database and are

indexed by Sphinx search. Batch is loaded by a script by default every minute. This setting can be

Page 23: Choosing an open source log management system for small business

23

changed in elsa_node.conf file by setting a value in seconds for “index_interval” After each batch

is loaded Sphinx indexes the newly inserted rows in temporary indexes, then again in larger

batches every few hours in permanent indexes [12]. ELSA flow diagram is presented below:

Network → Syslog-ng (PatternDB) → Raw text file

or

HTTP upload → Raw text file

Batch load (by default every minute):

Raw text file → MySQL → Sphinx

Additional functionality can be added to ELSA using plugins. New plugins can be added by sub-

classing the "Info" Perl class and editing the elsa_web.conf file to include them. Plugins that are

included in ELSA by default are presented below:

• Windows logs from Eventlog-to-Syslog

• Snort/Suricata logs

• Bro logs

• Url logs from httpry_loggere

These plugins allow applying specific actions using the log data. For example if URL plugin is

configured - any log that has an IP address in it will have a "getPcap" option which will auto-fill

pcap request parameters for one-click access to the traffic related to the log being viewed. This

option is available if a pcap server like OpenFPC or StreamDB is installed and configured in

elsa_web.conf.

3.2. Input and output

3.2.1. Graylog2 input and output

Graylog2 accepts Syslog messages via TCP and UDP. Additionally it accepts messages in its own

Graylog Extended Log Format (GELF) via TCP, UDP and HTTP. GELF logs are basically

messages archived with Unix Gzip and formatted in JSON.

Graylog2 also supports AMQP input (Advanced Message Queuing Protocol) via such message

queuing middleware like RabbitMQ, Apache Qpid, OpenAMQ, SwiftMQ etc. Message queuing

software is used to make sure that the messages are delivered from point A to point B. It stores

messages in memory (writes to disk), waits for the buffer to clear after a peak of log traffic and

then offers these messages to the logging system.

For syslog default port is 514, GELF 12201 and AMQP 5672. Graylog2 is using Drools Expert

[16] to check the incoming log messages against a user defined rule file Jabber/XMPP is used for

Page 24: Choosing an open source log management system for small business

24

sending alerts. Internal metrics and stream counts can be stored into Graphite [17] and Librato

[18] to turn these stats into visualization.

3.2.2. Kibana input and output

Kibana imports logs from Elasticsearch. Originally Kibana was designed as frontend for Logstash.

Logstash supports a wide range of inputs including IETF syslog, Gelf, Elasticsearch, snmptrap,

eventlog, Twitter etc. On the homepage of Logstash there are currently supported 37 inputs, 28

filters and 47 outputs (for a complete list see Appendix 2)

A simple scenario (see configuration below): receiving logs from TCP and UDP ports 514 and

sending all logs to Elasticsearch. For TCP and UDP inputs “port” and “type” are required fields

(for all options see Appendix 5).

input {

tcp {

port => 514

type => syslog

}

udp {port => 514

type => syslog

}

}

output {

elasticsearch {

}}

In scenario where Rsyslog is used instead of Logstash all the Rsyslog functionality applies

including inputs and outputs. Rsyslog receives local messages from within the kernel, remote

messages can be received in BSD or IETF syslog format. Received messages can be written to log

files, sent to remote syslog servers, etc. Additional modules can be used in Rsyslog e.g.

oemelasticsearch (Rsyslog to Elasticsearch) for input to specific sources.

Page 25: Choosing an open source log management system for small business

25

A more advanced Logstash configuration is presented below.

input {

tcp {

port => 514

type => rsyslog

}

udp {

port => 514

type => rsyslog

}

}

filter {

grok {

type => "rsyslog"

pattern => [

"<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:syslog_timestamp}

%{SYSLOGHOST:syslog_hostname}

%{PROG:syslog_program}(?:\[%{POSINT:syslog_pid}\])?:

%{GREEDYDATA:syslog_message}" ]

add_field => [ "received_at", "%{@timestamp}" ]

add_field => [ "received_from", "%{@source_host}" ]

}

syslog_pri {

type => "rsyslog"

}

date {

type => "rsyslog"

syslog_timestamp => [ "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]}

mutate {

type => "rsyslog"

exclude_tags => "_grokparsefailure"

replace => [ "@source_host", "%{syslog_hostname}" ]

replace => [ "@message", "%{syslog_message}" ]

}

mutate {

type => "rsyslog"

remove => [ "syslog_hostname", "syslog_message", "syslog_timestamp"

]

}

}

output {

elasticsearch_http { }

}

In this scenario: Rsyslog ships logs to Logstash, then “syslog_pri”, “grok” and “mutate” plugins

are used to parse the logs which are then sent to Elasticsearch via HTTP [19].

Page 26: Choosing an open source log management system for small business

26

3.2.3. ELSA input and output

ELSA is using Syslog-ng as the log receiver. All the inputs for Syslog-ng are valid for ELSA,

they are called source drivers. In Syslog-ng.conf it is configured using this syntax [7]:

source id { driver1(opt1); driver2(opt2); ...; };

Some source drivers:

– file(fname [options]) – read messages from a file fname (usually employed for

reading messages from special file of kernel messages, for example /proc/kmsg)

– internal() – read Syslog-ng internal messages – unix-stream(fname [options]),

unix-dgram(fname [options]) –

read messages from a UNIX file system socket fname in stream or datagram mode –

tcp([options]), udp([options]) – receive BSD syslog messages from remote hosts over

TCP or UDP – syslog([options]) – receive IETF syslog

ELSA node can forward log messages between nodes using SCP and HTTP/HTTPS. Although

ELSA has a certain predefined log flow it should be possible to send output using Syslog-ng in

parallel to the batch loading into MySQL - e.g. syslog messages to an IP address and a TCP or

UDP port.

3.3. Interface

3.3.1. Graylog2 interface

Graylog2 Interface is arranged into tabs. Menu buttons at the top for the page can be used to

switch between the tabs. These buttons are: messages, streams, hosts, blacklists, settings and

users. By default settings the messages tab is first displayed to the administrator after logon.

Regular user can see only streams tab. Messages tab consists of next main parts: search field,

menu, overview table and a sidebar.

Search field is wide across the whole page and contains sample search instructions in transparent

text which disappear once clicked for placing the cursor in it. There is a dropdown with relative

time next to the search field with options ranging from 5 minutes through 1 day, 1 month to

always. The default time is 5 minutes, so recent data is displayed to the administrator right after

the logon.

Overview table contains a list of log message lines. Messages can be clicked to display more

information in the sidebar: permalink, breakdown of the message, full message and stream name

if it belongs to any. Overview table also shows the total amount of logs and has links to toggle

between the view of recent and all log messages. Additionally there is a button in the shape of an

asterisk which can be used for highlighting today’s messages.

Page 27: Choosing an open source log management system for small business

27

Sidebar by default shows a graph of recent incoming logs and a welcome message. Favourite

streams’ mini graphs are also shown there. Sidebar basically shows details of the active objects

clicked by the user e.g. log message in the overview table. When scrolling down the list of logs a

“Back to top” button appears on the screen which makes it comfortable to get back to the top of

the page where all the menus and search field are located.

When the sidebar is displaying a graph it has a “server health” button. It leads to a page with a

dashboard with near-real-time throughput statistics also showing the recent highest value. (The

current throughput in logs per second is also shown on the main page in the top right corner.)

Apart from the dashboard “server health” page contains status information on the Elasticsearch

server and also shows main server applications status log messages produced by Graylog2 (e.g.

Graylog2 server start-up and shutdown). Streams tab contains controls to create saved searches

and arrange them into categories. Hosts tab contains a list of hosts which are automatically added

once logs start coming from a source. Blacklist tab has an option to create a blacklist with a set of

regular expressions rules to filter out unneeded content which will be discarded.

Settings tab has some subsections which allow defining the length of a message shown of the

user, adding a column to the log list, configuring AMQP settings, adding comments to messages

using regular expressions, define templates for sensitive data for filtering out, enable or disable

plugins and check if the last version of Graylog2 is installed. Users tab allows creating user

accounts of two types: admin and reader. Reader user see streams tab only with the streams

assigned by the admin user in it.

3.3.2. Kibana interface

Kibana has a very well designed interface. It has exactly what is needed for easily searching

through formatted data and analysing it with a single click. Does not mean that searching through

unformatted logs is not possible, it would just require manual query writing.

Kibana web interface home page has next main sections: search field, field panel containing

message fields (also referred to as “Show fields” section), graph, and a table panel which is

mainly a list of logs.

Search area is a big black rectangular frame positioned across the page on the top. In its left part

there is a small white Kibana logo serving as “home” link and a time dropdown. A white search

field is in the middle and when it is blank has “Search” inside written in thin font. On the right

from the search field there is a blue “Search” and a red “Reset” buttons. The extreme right part of

the search area has a mini dashboard displaying the current number of search hits. The time

dropdown next to the search field has relative time options ranging from 5 minutes through 12

Page 28: Choosing an open source log management system for small business

28

hours, 7 days to “All Time”. The default time is 15 minutes and there is also an option of Custom

time frame.

Interface overall is very dynamic and interactive. All the lines of logs are clickable and

expandable into more detailed fields. Each field can be used for dynamic query building. When

fields in the field panel are clicked - a menu with quick stats appears. Buttons such as “score”,

“trend”, “terms” and “stats” inside this menu can be used for various analytical manipulations like

changes in share, average values, distribution represented in pie charts, stock market type tables

etc. With Kibana 3 it is possible to design a custom interface interactively without any coding. It

is possible to create custom panels and dashboards and save these interfaces.

3.3.3. ELSA interface

It is possible to create custom panels and dashboards and save these interfaces. ELSA interface

design is very minimalistic and conservative. In administrator account there are five dropdowns

which remind a bit of Windows 95 menu. In the left top corner over the search field are located

Elsa and Admin menus (Elsa not ELSA is used for referring to the menu, this is the way it is

written in the ELSA interface, same applies to other menu names). Elsa menu consists of Query

Log, Saved Results, Alerts, Active Queries, Dashboards, Saved Searches and Preferences. Query

Log contains the list of recent queries and statistics with the time used for running the query.

Saved results section has the list of saved results and allows creating an alert or schedule.

Additionally it allows rerunning the search and presents permalink for the query results. It is

possible to schedule a rerun of a certain query and apply an action if the new events matching the

search criteria are recorded. Here are some of the actions available: save report, send email, send

to CIRT send to malware analysis sandbox. Elsa dropdown can also be used to view alerts, saved

searches and active queries. Dashboards can be created and managed through the Dashboards

option in the Elsa dropdown.

Admin dropdown menu allows managing permissions, viewing stats on a general dashboard,

cancelling livetails and viewing alerts. Livetails are live streams of logs. This function is currently

deprecated in ELSA because of stability issues [12].

Search results are presented in a tab below the search field, a new one is created by default for

each search. It is possible to use the same tab for updating the search by ticking “reuse current

tab” on the right from the search field. It is possible to change the ordering style inside the tab to

“grid” with a second tick in the same area.

Page 29: Choosing an open source log management system for small business

29

It is possible to apply an action (e.g. export results, alert or schedule, add to dashboard, save

search etc.) to the search results using “Results Options” dropdown inside the tab. Search field

consists of a field called “Query” and “Submit Query”. There are two separate fields (From and

To) for starting time and end time of the query which can be filled using a calendar popup or

manually. There is an “Add term” and “Report on” dropdowns that allow using predefined

templates for building specific queries such as BRO, SNORT and Windows messages. There is a

separate dropdown for setting the type of Index to search in: Index, Archive, livetail etc.

3.4. Features

3.4.1. Graylog2 features

Streams in Graylog2 are saved searches that allow quick access to an overview of a certain

predefined situation. Streams are defined by rules which can be regular expressions, facility,

severity, host or a custom additional field with certain predefined value. It is possible to sort them

by custom categories. Here is an example of a stream in Graylog2.

Category: security

Stream name: SSH authentication failure

Regular expressions rule: sshd\[\d+\]: Failed password for (invalid user )?(\S+)? from

([\d.]+) port (\d+)

There is a possibility to create blacklists with a set of regular expressions terms inside to filter out

certain messages. The messages that match the predefined regular expressions patterns will be

dropped by the server.

Once a message is received and accepted by Graylog2 the originating host is automatically added

to the hosts list. The entire logging stream for any monitored host can be quickly accessed in the

hosts section. A host can be easily deleted from this list if it is no longer used. There is a “quick

jump to host” search field that might be very useful if there is a big list of monitored hosts. To

show all the logs that are presented specifically in this part of the graph - a segment of a graph can

be highlighted by clicking on the “Show messages in range” button.

It is possible to assign an alert for all users or for each stream, so that users that are assigned to

this stream would get an email. This is useful in case there is an event that needs urgent

interaction by a specific person or group. Log rotation can be achieved by setting

elasticsearch_max_number_of_indices in graylog2.conf elasticsearch_max_number_of_indices

multiplied by elasticsearch_max_docs_per_index equals total number of messages held within the

setup.

Page 30: Choosing an open source log management system for small business

30

3.4.2. Kibana features

Kibana has a very dynamic interface which allows flexible on-demand data analysis and visual

representation. Each log line can be expanded with one click within the same area to allow access

to details. There are action buttons that can be used for dynamically creating very specific queries.

Each line within the “fields panel” can be clicked to get a multi-purpose menu with quick stats.

Using this menu it is possible to see the distribution of the most popular occurrences, add specific

columns to the logs in the “table panel” (same as clicking a plus sign next to any field in the field

panel), include and exclude certain fields from the query with a single click (same as actions

within log details in “table panel”) use analytical tools on this data and mark all these occurrences

in the table panel with red font.

Such functions as Score, Terms, Trend, Stats can be used for data analysis. It can be done by

pressing the corresponding buttons inside the menu a field or manually piping in the search field.

@fields.host:log NOT @fields.facility:"user" | terms severity

This query can be produced dynamically by 6 clicks in the “fields panel”. First two clicks: one on

@fields.host to open a popup menu, second click on “include” icon (which looks like a

magnifying glass) next to hostname “log”. Next to exclude all messages with user facility click on

@fields.facility in the “fields panel” and just click on the “exclude” icon (which looks like a “no

parking” sign - a slashed circle). Now the last two clicks: first one on @fields.severity in the

“fields panel” and second one on the “terms” button inside the popup menu. Statistics is based on

the last 2000 logs received, but this amount can be changed by editing the value of

"Analyze_limit" in KibanaConfig.rb.

Kibana does not have its own user management by default, but authentication modules can be

configured in KibanaConfig.rb. It is possible to hold user accounts in Elasticsearch and use Ldap

for authentication [21].Alternatively user authentication can be done with the help of e.g. Apache

or Nginx. Log rotation can be done by scheduling a script for deleting old Elasticsearch indices as

they are recorded in separate files by date.

Kibana 3 – new version released in 2013 has an extended dashboard and analytics module. Kibana

3 allows creating great custom dashboards, compare ranges of events by combining into one

graph etc. It is possible to save interfaces and queries into Elasticsearch, export to a file into “gist”

on the Github website [22].

Page 31: Choosing an open source log management system for small business

31

3.4.3. ELSA features

ELSA is a more performance than dashboard oriented solution which was designed for incident

response and fighting APT. It has a similar to Google style search and allows sorting search

results by any field and produce custom reports. It is possible to export results as permalink or in

Excel, PDF, CSV, and HTML. ELSA supports full Active Directory/LDAP integration for

authentication, authorization and email settings. It supports archiving of logs with better than 10:1

ratio. ELSA supports email alerts and other actions that can be triggered if defined queries get hits

on the new log messages. Fully distributed architecture, can handle n nodes with all queries

executing in parallel. ELSA ships with normalization for some Cisco logs, Snort/Suricata, Bro,

and Windows via Eventlog-to-Syslog or Snare [23]. Log rotation can be done by bytes or

retention period values set in elsa_node.conf file.

3.5. Search

3.5.1. Graylog2 search

In earlier versions of Graylog2 web interface the search field was divided into separate fields like

message, timeframe, facility, severity etc. Some of the fields supported Lucene syntax, some

required use of regular expressions. Starting from version 0.10 Graylog2 applied a more user-

friendly search method. Now there is one search field and Lucene syntax can be used in it. There

is a quick filter option to filter the search results by message, timeframe, facility, severity and

host. Graylog2 search message field is split into terms. Each part of the query delimited by space

is searched for separately. Apache Lucene syntax allows using wildcards, do fuzzy and proximity

searches (see Appendix 9 for more details).

3.5.2. Kibana search

Kibana (as well as Graylog2 and Elasticsearch) uses Lucene Query Syntax for search. It is

possible to do simple full text query across all the lines of log messages, or use Lucene to be very

specific and target certain fields and add conditions (see appendix 10 for more details). Its

dynamic interface makes creating new queries very easy.

3.5.3. ELSA search

ELSA syntax is basically the same as Google syntax. There is a possibility to do sub-searches by

piping one search into another. There is an important difference between the way the queries are

done in ELSA and the other two solutions. In ELSA it is not possible to use wildcards in basic

queries. Only special asynchronous queries can contain wildcards. Results for such queries are

sent later by email (see Appendix 11 for more details).

Page 32: Choosing an open source log management system for small business

32

3.6. Conclusion based on comparative analysis

Each of these log management solutions has its strong and weak sides. The choice of a system

strongly depends on the environment it will be used at and the goals that are pursued. There is no

perfect solution for every purpose and environment. Here in the table below are some main

advantages and disadvantages of the log management solutions according to the author’s opinion

(see Table 2).

Advantages Disadvantages

Graylog2

1. Easy basic user management

with possibility of advanced

authentication (e.g. LDAP)

2. Saved searches (called streams)

can be easily assigned per user

3. Creating blacklists to drop logs

that match a pattern from the web

interface menu

4. Nice and simple interface

1. Insufficient analytical functionality

2. Too many operations needed to

see the log details

Kibana

1. Easy point and click analysis

2. Choice between easy integration

with Logstash or Rsyslog

3. Really usable and efficient

interface

4. Kibana 3 offers easy interface

customisation

5. Great dashboards

1. No alerts

2. No native user management (in

development)

3. No saved searches (in

development)

ELSA

1. High-volume receiving/indexing

(a single node can receive > 30k

logs/sec, sustained)

2. Settings can be changed without

restarting services as scheduled

script reads the configuration

3. Customisable action of Info field

in the logs depending on the log

type (plugins needed)

4. Allows scheduling searches and

various alerts and actions triggered:

email, ticket creation,

5. Gathers statistics for queries by

user and log size and count

1. Not too flexible, designed

specifically for Incident response and

high scale

2. Web interface very conservative

3. Livetail not available currently Table 2 Advantages and disadvantages of log management solutions

Page 33: Choosing an open source log management system for small business

33

According to the author’s opinion Graylog2 is a great tool for environments that need to give

access to specific logs only. An example would be a company that is providing IT services and

has different teams: developers, system administrators, network administrators, supervisors who

should only have access to specific part of the logs.

Kibana would be the best choice for environments that benefit from combination of great

usability, analytics and good performance.

ELSA should be suitable for high volume and high scale log management. It is specifically

designed for network incident response and fighting APT. This is a great tool for large network

monitoring, for example ISPs or CERT could benefit from using ELSA.

Page 34: Choosing an open source log management system for small business

34

4. Choosing a log management solution

In order to choose the best suiting log management solution some primary and secondary data was

collected for a detailed comparison. Secondary data was collected from official websites of the

log management solutions, configuration files and related web resources: e.g. forums and

discussions including Github - website for managing development projects [24]. Primary data

was generated by setting up latest versions of all three log management systems in virtual

environment and performing a series of tests. Testing process and results will be described in

chapter 4.2 and its subsections.

4.1. Logging requirements for small business

Small companies usually have a wide variety of different systems and devices in their

infrastructure. It could sometimes be a mixture of different vendors and different sorts of

operating systems. This sets the requirements that the log management system should be suitable

for mixed types of logs.

As the event rates and log message volumes are normally modest performance is not the key

factor in the choice of a log management system. The usual rate for a small company might be

100 – 200 events per second. The number of course can be different depending on the size of the

network, specific environment, logging level and the tasks solved by log management. This

allows solutions with lower performance like Graylog2 and Kibana to compete with high

performance ones like ELSA in the framework of a small company.

What concerns the target company where the chosen log management solution will be

implemented the event rate is estimated to around 1000 - 2000 logs per second with hypothetical

peaks of 3000 per second in case debugging is turned on for main systems. This relatively high

event rate for a small company is expected because all the syslog capable devices in local network

would be sending logs to the central log management solution and additionally the logs from

critical servers in the cloud might be sent as well. For big companies event rate could be much

higher 50 000 – 100 000 logs per second.

Page 35: Choosing an open source log management system for small business

35

4.2. Testing

Performance and usability testing was carried out for gathering primary data which is needed for

comparison. Usability testing results are based on the author’s experience and opinion.

4.2.1. Testing environment

Modest hardware specifications were chosen for the performance testing as normally small

companies, including the one where the tests were carried out, have limited resources.

Additionally, performance on hardware with low specifications might show how efficiently the

system utilizes limited resources. CentOS was chosen as the operating system (e.g. not Debian)

because it is officially supported by Microsoft Hyper V, which would be the production

environment for the log management solution [25]. Testing was done on virtual machines using

Oracle VirtualBox version 4.2.10 r84104.

Here in the below are the basic specifications of the host used for testing are described:

Hardware used: Acer TimelineX 5830

OS Microsoft Windows 7 Professional 64-bit SP1

CPU Intel Core i5 2430M @ 2.40Ghz Sandy Bridge 32nm

RAM 6,00GB Dual-Channel DDR3 @ 665MHz (9-9-9-24)

Motherboard Acer JM50_HR (CPU1)

Hard Drive 238GB V4-CT256V4SSD2 (SSD)

NIC Atheros AR8151 PCI-E Gigabit Ethernet Controller

CentOS was chosen as the guest operating system. Here are the hardware resources and exact

version of operating system used:

CentOS 6.4 Kernel 2.6.32-358.2.1.el6.x86_64

Assigned hardware resources per log management server:

1 virtual CPU core (for single-core test)

4 virtual CPU cores (for multi-core test)

2048 Mbytes of RAM

Dynamic VHD disks space

Page 36: Choosing an open source log management system for small business

36

4.2.1.1. Graylog2 software components

Latest version of Graylog2 log management solution at the time when the performance testing

was done was 0.11.0. This version of Graylog2 requires minimum java 1.6 and ruby 1.9 or higher.

Here is the list of the main components of Graylog2 solution and the corresponding logos (see

Figure 2 below)

Figure 2 Graylog2 software components

See Appendix 8 for more details.

Page 37: Choosing an open source log management system for small business

37

4.2.1.2. Kibana software components

Kibana was designed as a frontend for Logstash, but it can be used with other backend systems

which can send specially structured logs into Elasticsearch. (e.g. Rsyslog with oemelasticsearch

module) Here is the list of the main components of Kibana solution and the corresponding logos.

(see Figure 3)

Figure 3 Kibana main components

See Appendix 8 for more details on components.

Page 38: Choosing an open source log management system for small business

38

4.2.1.3. ELSA software components

ELSA can be installed with a fully automated script install.sh which installs the program and all

the dependencies from scratch. Here is the list of the main components of ELSA solution and the

corresponding logos. (see Figure 4)

Figure 4 ELSA main components

See Appendix 8 for more details on components.

Page 39: Choosing an open source log management system for small business

39

4.2.2. Performance testing

For comparing the log management systems, a performance test was done. The benchmark used

for stress-testing each system comprised of sending a large batch of 100,000 IETF syslog

messages to the tested system. In order to ensure reliable delivery of all messages, they were sent

over TCP protocol, without any delays between issuing individual messages. The performance of

the system was measured in overall test execution time. In other words, the execution time reflects

what the event processing speed of the system is that is observed by the client, and how much log

data can the client realistically transmit to the system in a given time frame.

Command in the script is used for sending IETF formatted logs. The commands are presented

here:

#!/bin/bash

printf '<6>1 2013-04-25T22:00:00Z myhost kernel - - - this message is a test\n%.0s'

{1..100000} | nc -w 1 -t 127.0.0.1 514

In addition to measuring the event processing speed, CPU consumption of the individual parts of

each log management system was investigated in order to identify potential bottlenecks.

Tools used for performance testing: time [26], nc [27] (netcat), htop [28].

A simple tests script logtest.sh was used. Unix time utility was used to calculate the time it takes

to run the script. Here below is the shell command used for running the test.

/usr/bin/time -f'%E' ./logtest.sh

(-f'%E' to show only elapsed time without user or system time)

Unix ”printf” command is used to generate standard output. Operator \n is used to indicate the end

of the line. Variable %.0s uses value range in curly brackets to generate corresponding number of

lines. Then through the pipe these lies of formatted text are forwarded to netcat and sent using

TCP or UDP to the needed IP address and port (“-w 1” defines 1 second timeout, means that if no

more input is detected for 1 second the connection is closed. “–t” means TCP as we needed to

make sure the logs get to the destination to count time. “127.0.0.1 514” target IP and port.

Page 40: Choosing an open source log management system for small business

40

4.2.2.1. Performance testing results:

Performance testing showed that in given configuration these solutions can be set in the next order

from highest performance to lowest:

1. ELSA

2. Kibana and Rsyslog

3. Kibana and Logstash

4. Graylog2

Figure 5 presented below shows the comparison of performance test results in logs/per second.

Figure 5 Performance test results statistics compared

*tweaked setup (described in the end of 4.2.2.1) results are presented in green

CPU percentage stated in the test results are based on indicators in htop, which interprets each

virtual CPU core (thread inside a physical CPU core) - as a 100% of CPU. Both single and

multicore setups were used for each series of performance tests.

0

2000

4000

6000

8000

10000

12000

14000

16000

Graylog2

logs/sec *

Kibana and

Logstash

logs/sec

Kibana and

Rsyslog

logs/sec

ELSA logs/sec

1 Virtual CPU core

4 Virtual CPU cores

4 V. CPU cores tweaked

Page 41: Choosing an open source log management system for small business

41

Use of 4 cores increased the performance of log management systems: Graylog2 about 70%,

Kibana and Logstash 60%, Kibana and Rsyslog 25% and ELSA 28,6% (see Figure 6).

Figure 6 Relative increase in performance with 4 cores

Graylog2 and Kibana showed very good increase in multi-core setup as both programs are multi-

threaded and CPU intensive. As Elasticsearch, which also is CPU intensive, was run on the same

machine in this test setup – adding more CPU power increased performance considerably. Kibana

with Rsyslog and ELSA had a smaller increase in performance when more CPU cores were

added. For Rsyslog this can be explained by the limits of the oemelasticsearch module

performance. It can send messages via TCP up to 10 000 logs per second [29]. It is a good result

for such modest hardware to achieve more than 50% of the maximum performance. ELSA is

already so efficient, that the change in performance was not so big. Additionally the difference

was hard to measure accurately using an external stopwatch.

4.2.2.1.1. Graylog2 performance test

Sending 100 000 IETF formatted logs resulted in average time of 2 minutes and 50 seconds,

which is about 588,2 logs per second.

This is an average score calculated based on 20 tests. During the performance test most CPU was

used by Graylog2 server process, which utilised in average around 58% of CPU. Second CPU

intensive process was Elasticsearch, which consumed in average close to 38%.

When 4 virtual cores were used, the time needed for handling 100 000 logs went down to average

of around 1 minute 40 seconds. This is 1000 logs per second.

0,00%

20,00%

40,00%

60,00%

80,00%

100,00%

120,00%

140,00%

160,00%

Graylog2

logs/sec

Kibana and

Logstash

logs/sec

Kibana and

Rsyslog

logs/sec

ELSA logs/sec

Increase in performance

tweaked

Page 42: Choosing an open source log management system for small business

42

As the number of logs per second was relatively low compared to other systems, additional tests

for Graylog2 were carried out tweaking the configurations. The tests were done using 4 virtual

CPU cores. The best performance in this setup was achieved by limiting the number of processors

used by Graylog2, which allowed more CPU to be used by Elasticsearch. During the test the most

CPU was consumed by Elasticsearch in average around 280%, which translates into 2.8 virtual

cores. Graylog2 consumed in average around 100% CPU, which is one virtual core.

This was achieved by setting processbuffer_processors = 1 and outputbuffer_processors = 1 in

graylog2 conf file. (see appendix 7 for configuration file sample) This setup might most likely be

not good for production as it might cause buffer overflow. It was used for testing purposes only

and it eventually gave the best performance results. During this test the Graylog2-server.jar

process was started in foreground to make sure there is no buffer overflow or other error messages

because of such setup.

As the result of tweaking the settings, the best time needed for handling 100 000 logs was in

average in 1 minute 10 seconds. This is about 1428,6 logs per second (see Figure 7).

Figure 7 Graylog2 of performance test results logs/sec

0

200

400

600

800

1000

1200

1400

1600

Graylog2 logs/sec

1 Virtual CPU core

4 Virtual CPU core

4 V. CPU cores tweaked

Page 43: Choosing an open source log management system for small business

43

4.2.2.1.2. Kibana & Logstash performance test

Output in logstash.conf set to elasticsearch _http. Grok, mutate and syslog_pri used for filtering

and indexing. (see advanced scenario in chapter 3.2.2.)

Sending 100 000 IETF formatted logs resulted in average time of 1 minutes and 41 seconds,

which is about 990 logs per second.

This is an average score from 20 tests with IETF formatted messages. Most CPU was consumed

by Logstash server process, which took in average around 60% of CPU. Second CPU intensive

process was Elasticsearch, which consumed in average around 35%. Kibana.rb process consumed

around 2% of CPU.

When multicore setup of 4 virtual cores were used, the time needed for handling 100 000 logs

went down to average of around 1 minute 3 seconds. This is about 1587 logs per second. (see

Figure 8)

During the multicore test about 170 %, which is 1,7 cores was used by Logstash. Around 100%

which is 1 virtual core in average was used by Elasticsearch sometimes peaking at 150%. At the

same time process of Kibana.rb was consuming 2-3% of a CPU virtual core.

Figure 8 Kibana and Logstash performance test results logs/sec

0

200

400

600

800

1000

1200

1400

1600

1800

Kibana & Logstash logs/sec

1 Virtual CPU core

4 Virtual CPU core

Page 44: Choosing an open source log management system for small business

44

4.2.2.1.3. Kibana and Rsyslog performance test

In single core test Elasticsearch consumed in average about 85 % of the CPU. Rsyslog consumed

about 2-3% of a single core. Kibana stayed around 2% of a single core CPU mark. Single core

test: 100 000 IETF log lines in 22 seconds – 4545,45 logs per second. (see Figure 9)

Figure 9 Kibana and Rsyslog performance test results logs/sec

During the test using 4 virtual cores Elasticsearch multi-process averaged around 250% of CPU

which is 2,5 virtual cores, sometimes peaked at 370 %. Rsyslog 2 processes utilised 12% CPU in

average each. Kibana.rb consumed 2-3% of 1 CPU virtual core. Test results with 4 virtual cores:

100 000 IETF log lines in 17,6 seconds – 5681,82 logs per second.

0

1000

2000

3000

4000

5000

6000

Kibana & Rsyslog logs/sec

1 Virtual CPU core

4 Virtual CPU core

Page 45: Choosing an open source log management system for small business

45

4.2.2.1.4. ELSA performance test

Since ELSA log reception and log storage procedures are separated from each other and log data

is written into storage asynchronously, the event processing speed observed by the client is very

high, since there is no performance penalty that database access would incur. Nevertheless, while

asynchronous log storing provides performance benefits to the client, it also leaves the database

out of sync for a certain time frame (by default, for 1 minute). In order to provide a fair

comparison with other systems, the log reception and log storing times were measured separately

and added up. While this method is not 100% precise, it provides a good estimate of log data

processing time from the client's perspective.

Accroding to results of the test with single CPU core it takes about 9 seconds to send 100 000 logs

through Syslog-ng using PatterDB for parsing them until these logs become available for querying

in the ELSA web interface. This is about 11 111 log lines per second.

Multi CPU core setup the operation starting from sending the logs to getting results in ELSA web

interface took about 7 seconds. This accounts for about 14285,7 logs per second. (see Figure 10)

CPU consumption during the tests showed how efficient ELSA actually is. The most CPU

intensive processes were ELSA, Syslog-ng and Sphinx Search. When the single core test was run,

first ELSA and Syslog-ng consumed almost 50% of CPU each. Then after the batch was loaded

into MySQL database, Sphinx shortly peaked at almost 100% CPU. The CPU peaks lasted one or

two seconds. This shows how much more efficient ELSA (written in C) is compared to Graylog2

and Kibana (written in Javascript). Multicore setup showed a bit different distribution with

utilization of more resources. Syslog.ng and ELSA each used one full virtual core 100%. Sphinx

search used one core for100% and sometimes utilised more resources. The rest was used by

MySQL and other processes.

Figure 10 ELSA performance test results logs/sec

0

2 000

4 000

6 000

8 000

10 000

12 000

14 000

16 000

ELSA logs/sec

1 Virtual CPU core

4 Virtual CPU core

Page 46: Choosing an open source log management system for small business

46

4.2.3. Usability testing

According to the authors opinion all three systems have well-built web interfaces. Kibana is the

most dynamic of the three, and is the most usable according to the author’s experience. Graylog2

is very user-friendly and has many functions at the finger tips such as user management and

streams. It is a bit less dynamic than Kibana. One of the main reasons for this is that the sidebar is

needed for showing log details, which makes adds an extra action. The choice depends on the

environment where it would be used. There are some important visual and functional differences

which would most surely influence the decision. All the qualities and features of the log

management systems are discussed in chapter 4.2.3.1 and are evaluated and ranked from the

usability perspective based on author’s experience and opinion.

4.2.3.1. Usability testing results

The systems were given points for each test depending on the rating. First place gave 3 points,

second place 1 point and third place 0 points. (3, 1 and 0 point system was chosen to support the

solution that takes first place more times) According to author’s opinion, considering pure

usability experience, the programs can be put in next order with best usable on top:

1. Kibana

2. Graylog2

3. ELSA

The table below contains the total usability score and scores for every test of each solution (see

Table 3).

Usability test results

Graylog2 Kibana ELSA

Visuals and design 1 3 0

Saved searches 3 0 1

Alerts 1 0 3

Authentication and Authorisation 3 0 1

Search syntax 1 3 0

Analytics 0 3 1

Ease of use 1 3 0

Universality 1 3 0

Ease of installation 0 3 1

total: 11 18 7

Table 3 Usability test score

Comments for each test are added in parts 3.1.3.1.1 – 3.1.3.1.9.

Page 47: Choosing an open source log management system for small business

47

4.2.3.1.1. Visuals and design

Kibana and Graylog2 have a more colourful interface with high contrast schemes if compared to

ELSA. For search field Kibana uses a bold black frame on the very top of the web page which

seems very comfortable as most browsers have the navigation bar on top of the page (used for

URL input). Graylog2 has quite a big part on the top of the page used for the logo and the tabs.

The search field is located right under the tabs.

Kibana has the most functional, user-friendly and nice looking dashboards and graphs. Elsa would

probably take the second place as it uses Google visualizations. The drawback is they are

dependent on internet access (specifically access to Google site).

Kibana has a solid dynamic interface which gives a feeling everything is at the fingertips.

Graylog2 uses a tab like structure for menus in comparison to Kibana it provides much more

modest visualization and offers minimum data analysis. ELSA has a conservative looking

interface with grey dropdowns and sub-menus. Interface gets the job done, but seems a bit boring

and rigid. According to author’s opinion considering visuals and design the programs can be put

in next order with the system having the best visuals on top:

1. Kibana

2. Graylog2

3. ELSA

4.2.3.1.2. Saved searches

Graylog2 streams are very easy to configure but require using regular expressions. Although

Kibana does not have saved searches there are workarounds on how to save URLs with the query

and there is a feature request, so it is being worked on at the moment [30]. ELSA has saved

searches based on a query and allows scheduling saved searches. According to author’s opinion

considering saved searches the programs can be put in next order with best options for saved

searches on top:

1. Graylog2

2. ELSA

3. Kibana

Page 48: Choosing an open source log management system for small business

48

4.2.3.1.3. Alerts

Graylog2 can send email alerts in case a pattern is matched in the incoming logs during a set

period. Grace period option was added to the latest release, which allows limiting the number of

notifications. Kibana does not have the alert functionality.

ELSA allows scheduling saved queries which search within the new logs. If there are positive

results on the query a defined action like an alert or sub-query is triggered. ELSA supports a

number of ways for sending alerts e.g. email, ticket creation and sub-query execution to search

within the results for more precise search. According to author’s opinion considering alert options

the programs can be put in next order with best options for alerts on top:

1. ELSA

2. Graylog2

3. Kibana

4.2.3.1.4. Authentication and authorisation

According to the author’s opinion Graylog2 has the best authentication and authorisation options.

It allows easily creating basic user accounts in the web interface and supports more complex

authentication mechanism like LDAP. Graylog2 can be easily used with basic authentication and

then later settings can be added into ldap.yml configuration file for using LDAP.

Kibana’s native authentication and authorisation module “kibana-ruby-auth” is currently under

development [31]. As a workaround it is possible to use LDAP and other authentication using

Phusion Passenger e.g. as an Apache or Nginx module [32].

ELSA has three basic authentication and authorisation modes: none, local and LDAP. First mode

allows any user that accesses the web page to have administrative access as a pseudo-user. Second

mode allows access based on credentials and group settings in local system database. The third

option is using LDAP/AD accounts and security groups. According to author’s opinion

considering authentication and accounting the programs can be put in next order with best options

on top:

1. Graylog2

2. ELSA

3. Kibana

Page 49: Choosing an open source log management system for small business

49

4.2.3.1.5. Search syntax

In Graylog2 earlier versions search used to be in multiple fields, some of which supported Apache

Lucene and some regular expressions. Starting from version 0.10 Garylog2 uses a single search

field which supports pure Apache Lucene syntax. Saved searches are still done in regular

expressions and have only possibility to combine templates for matching positives, but no

templates to define exclusions can be added. So in general it is still a combination of Lucene and

regular expressions. There is a quick filter function which allows filtering the search results by

message, timeframe, facility, severity and host.

Kibana has always used Apache Lucene search syntax. As dynamic queries are very easily created

in Kibana, it makes it very simple to make very specific search patters from scratch.

ELSA uses a close to Google style search syntax, but the important difference is that no wildcards

can be used in basic queries. Only asynchronous queries can have wildcards, in which case results

would come later by email, which is not very convenient in many cases. According to author’s

opinion considering search syntax the programs can be put in next order with the best application

of search syntax on top:

1. Kibana

2. Graylog2

3. ELSA

4.2.3.1.6. Analytics

Concerning data analysis Graylog2 has very limited functionality. It has some basic graphs which

show the amount of logs per given period.

Kibana offers flexible and functional analysis tools with very good dashboards. Kibana 3 allows

creating custom interfaces and dashboards.

Elsa has good dashboards based on Google Visualisations, which are a powerful tool, but require

internet access from the server, which is not always a good option and sometimes not possible.

According to author’s opinion considering analytics the programs can be put in next order with

the best analytics software on top:

1. Kibana

2. ELSA

3. Graylog2

Page 50: Choosing an open source log management system for small business

50

4.2.3.1.7. Ease of use

According to the author’s opinion Kibana is the most intuitive and easy to use. All the operations

take minimum clicks and movements and can be done in more than one way. Graylog2 version

0.11 has improved in terms of ease of use in comparison to 0.9x Single search field was

introduced which supports Apache Lucene syntax. It takes more operations than in Kibana to see

details of an event log. To do that a permalink inside the sidebar should be clicked. This is not

very convenient. ELSA has the least intuitive and easy to use interface of the three solutions.

According to author’s opinion considering ease of use the programs can be put in next order with

the easiest to use on top:

1. Kibana

2. Graylog2

3. ELSA

4.2.3.1.8. Universality

Central log management can be used in different environments: networking administration,

application development, system administration, web administration, software testing etc.

Although there is no perfect universal solution to fit every environment, Kibana according to the

author’s opinion is likely to fit more types of environment because of its high usability and

analytics. ELSA is probably the least universal of the three solutions because it is designed

specifically for high scale network analysis. According to author’s opinion considering

universality the programs can be put in next order with the most universal on top:

1. Kibana

2. Graylog2

3. ELSA

4.2.3.1.9. Ease of installation

According to author’s experience Kibana was the easiest and most straightforward to install of the

three solutions. Installation of Elasticsearch consists of downloading, extracting and starting it.

Kibana has two simple commands more as it uses Ruby. Logstash is available in a single java file.

Rsyslog could be installed using packages (see Appendix 3).

Although ELSA has a fully automatic script tested on a number of Unix platforms, it works well

on clean OS installations only. The script resolves dependencies and installs the whole solution

within minutes and can be used for updating, but if there is an issue with a specific component, it

might fail. Then troubleshooting might be quite complicated as the structure is not trivial, manual

installation is also quite complex.

Page 51: Choosing an open source log management system for small business

51

From author’s opinion Graylog2 installation was the most complicated because of its web

Interface, which added a lot of non-trivial Installation steps. Another point because of MongoDB

Graylog2 initially requires substantially more disks pace, so the default 8 Gigabytes of disks space

assigned by default for CentOS by OracleVirtual Box had to be increased. According to author’s

opinion considering ease of installation the programs can be put in next order with the easiest to

install on top:

1. Kibana

2. ELSA

3. Graylog2

Page 52: Choosing an open source log management system for small business

52

5. Implementation

Based on the research and testing it was decided to implement Kibana as the front end of the log

management solution. The main log shipper for Kibana would be Rsyslog. The estimated event

rate is 1000 – 2000 with peaks up to 3000 events per second. According to the performance test

results, which were higher than 4000 logs per second, Kibana should be a suitable solution for the

environment.

5.1. Production environment

The environment for implementation of central log management system with Kibana as the front

end is a small office in Tallinn. This is a central reservations office for a Norwegian company.

There are around 150 client nodes. These are Intel hardware based workstation with Microsoft

Windows 7 pro managed through Active Directory. The main business critical services are kept in

a datacentre. There are some local servers e.g. DNS, Active Directory, Microsoft SharePoint,

antivirus management, Cacti network monitoring, Samba fileserver etc. Most of the servers are

installed as virtual machines on a Hyper V server. There are two separate network lines: one

dedicated line for internal business critical traffic and a local ISP for internet access. Dedicated

line connects to the datacentre and other offices. Main internal traffic is VoIP (Microsoft Lync),

Citrix and some web applications.

5.2. Implementation of Kibana in production

The production environment of the target company currently has: 10 switches, 5 routers, 10

servers and more than 150 workstations in local network. There are 10 critical servers held in the

cloud, which is a datacentre connected with a dedicated line to the local office. It would be a good

solution to get a backup of the logs kept on the main servers in the cloud. Additionally datacentre

storage space is much more expensive than local storage, so it is possible to fit more logs.

In the first step of implementation VHD with Kibana log management solution was imported from

the Virtualbox test environment to the Hyper V server. As the operating system is CentOS the

drivers for Hyper V are included and the migration was done with no issues [33].

Here are the specifications of central local server on Hyper V:

HP Proliant ML350G6 E5620 P410i/512+BBWC 3x2GB 3x146GB

30 GB HP REG PC3-10600

4x146GB 6G SAS 10K rpm SFF (2.5-inch) Dual Port Hard Drives

Page 53: Choosing an open source log management system for small business

53

Current setup is that Kibana is receiving syslog messages from all Unix servers, internal gateway

which is a cisco 800 series router and a 10 HP Procurve switches. Desired setup is to install to and

send log messages from all syslog capable devices to Kibana. As Windows does not support

Syslog a software client capable of converting event log to syslog should be installed on all

windows based workstation and servers (see Figure 11).

Figure 11 Scheme of Kibana implementation

While authentication and saved search features are being developed in Kibana, it is planned to use

Apache with passenger module for authentication and save searches manually. Queries in Kibana

generate URLs in a Base64 format [30]. Rsyslog would have two parallel outputs: one into

Elasticsearch and another one into text files. The text files would be kept to minimum reasonable

size and would be rotated using Rsyslog log rotation to avoid duplicate logging on the same

machine. Alerting could be configured using Simple Event Correlator (SEC) which would be

watching rotated log files created by Rsyslog and would be sending emails if a pattern is matched.

Page 54: Choosing an open source log management system for small business

54

6. Future research

The log management solutions described in this thesis were tested for small business. These

solutions could be used in bigger companies as well. These companies could benefit from using

those open-source solutions as the costs for log management can be very high in big companies

when using commercial solutions.

As the scope of the thesis is log management solutions for small business the tests were carried

out on modest hardware. Performance on more powerful hardware could be tested. This would

show how the systems would suit a larger scale environment. The tests were done on single nodes.

Scalability and performance of the solutions could be tested by installing clusters of different size.

The usability test results provided in this thesis were based on the author’s opinion. Similar testing

for a larger environment could be carried on a target group, as bigger companies have more

personnel available.

Page 55: Choosing an open source log management system for small business

55

7. Summary

According to the authors opinion all three systems have well-built web interfaces that serve their

intended purpose. The choice depends on the environment where it would be used.

Graylog2 is a great tool for environments that need to give access to specific logs only. An

example would be a company that is providing IT services and has different teams: developers,

system administrators, network administrators, supervisors etc.

Performance testing showed that ELSA is the fastest and can handle about 14285,7 logs per

second with the modest hardware resources used for testing. As the solution is meant for small

business, performance is not a crucial factor so Graylog2 and Kibana could very well compete

with ELSA in the given conditions.

According to usability test results Kibana is the most usable system.

Kibana would be the best choice for environments that benefit from combination of great

usability, analytics and good performance.

ELSA should be suitable for high volume and high scale log management. It is specifically

designed for network incident response and fighting APT. This seems like a great tool for large

network monitoring, for example ISPs or CERT could benefit from using ELSA.

Kibana and Rsyslog were chosen for installation in production environment because of the

usability, ease of installation and suitable performance.

Page 56: Choosing an open source log management system for small business

56

Vabavaralise logihaldussüsteemi valik väikeettevõttele

Magistritöö kood ITI70LT (30 EAP)

tudeng: Artjom Tšurilin matrikkli number: 113832IVCMM

Juhendaja: Risto Vaarandi, Ph.D

Resüme

Antud lõputöö keskendub kolme populaarse vabavaralise logihaldussüsteemi võrdlusele. Lõputöö

eesmärgiks on anda ülevaade kolmest populaarsest logihaldussüsteemist ja pakkuda juhiseid

sellise valikuks, mis parimal võimalikul moel sobiks väikeettevõttele.

Valik põhineb võrdleval analüüsil ning efektiivsuse ja kasutuskõlblikkuse testimisel.

Ettevõtte logiotsing ja arhiiv (ELSA) on ülimalt tõhus vabavaraline logihaldussüsteem, mis võib

silmad ette anda ettevõtte kvaliteetsetele kommertslahendustele. See on projekteeritud tõhusaks

häiringute tõrjeks ja võitluseks komplekssete püsiohtude (APT) vastu.

Kibana on logi analüüsi eeskomponent Logstash ja Elasticsearch jaoks. Seda võib samuti kasutada

muude tagasüsteemidega, mis toetavad vormindatud väljundit süsteemi Elasticsearch, sellist nagu

on Rsyslog lõppvalmistaja Elasticsearch mooduliga.

Graylog2 on alternatiivne logihaldusvahend omaenese veebi graafilise kasutajaliidesega (GUI).

Graylog2 eriomaduseks on, et logisid võib hõlpsasti jagada erinevatesse voogudesse, võimaldades

erinevatel kasutajatel juurdepääsu eri tüüpi logidele.

Tõhususe testimine näitas, et ELSA on kiireim ja suudab käsitleda umbes 14285,7 logi sekundis,

testimisel kasutatud tagasihoidlike riistvararessursside juures. Kuna lahendus on ette nähtud

väikeettevõtlusele, siis pole tõhusus otsustavaks teguriks ning Graylog2 ja Kibana suudavad väga

hästi antud tingimustes konkureerida ELSA-ga.

Lähtuvalt kasutuskõlblikkuse testi tulemustest on Kibana enim kasutuskõlblik ja süsteemseim.

Kibana koos Rsyslog’iga valiti sobivaimaks lahenduseks väikeettevõttele. Sellel on teatud

puudused, mis ilmnevad autentimisel ja salvestatud otsingutel, kuid kasutuskõlblikkus,

installimise kergus ja universaalsus teevad sellest väljapaistva lahenduse väikeettevõtlusele.

Puuduvad funktsioonid on väljatöötluse staadiumis, samas on võimalus kasutada

välismehhanisme ja vastukaalusid vea neutraliseerimiseks.

Page 57: Choosing an open source log management system for small business

57

List of References

[1] K Chandy. Mani Event-Driven Applications: Costs, Benefits and Design Approaches,

California Institute of Technology, www.infospheres.caltech.edu/sites/default/files/Event-

Driven%20Applications%20-

%20Costs,%20Benefits%20and%20Design%20Approaches.pdf (accessed 29.03.2013)

[2] http://www.webopedia.com/TERM/E/event.html (accessed 05.03.2013)

[3] R. Vaarandi, Cyber Defense Monitoring Solutions, 1-event-logs-and-syslog

[4] http://www.ietf.org/rfc/rfc3164.txt (accessed 07.04.2013)

[5] http://www.rsyslog.com/doc/history.html (accessed 06.04.2013)

[6] http://www.balabit.com/network-security/Syslog-ng/opensource-logging-

system/features/comparison (accessed 08.03.2013)

[7] R. Vaarandi, Cyber Defense Monitoring Solutions, 5-Syslog-ng-framework

[8] http://www.rsyslog.com/doc/licensing.html (accessed 06.04.2013)

[9] R. Gerhards, “Should I use rsyslog's new or old config style?” http://blog.gerhards.net/

(accessed 06.04.2013)

[10] http://www.graylog2.org/about (accessed 06.03.2013)

[11] https://docs.google.com/file/d/0By1KXg1ivlIeUjVoSVVjTVcxbzg/edit?pli=1 (accessed

18:03.2013)

[12] https://code.google.com/p/enterprise-log-search-and-archive/wiki/Documentation

(accessed 18:03.2013)

[13] http://elasticsearch.com/products/elasticsearch/(accessed 12.03.2013)

[14] http://www.sinatrarb.com/ (accessed 25.03.2013)

[15] http://enterprise-log-search-and-archive.googlecode.com/svn-

history/r112/wiki/Documentation.wiki (accessed 18:03.2013)

[16] http://www.jboss.org/drools/drools-expert (accessed 10.03.2013)

[17] http://graphite.wikidot.com/ (accessed 25.03.2013)

[18] http://support.torch.sh/help/kb/graylog2-server/using-librato-metrics-with-graylog2

(accessed 06.03.2013)

[19] http://www.logstash.net/docs/1.1.10/ (accessed 08.04.2013)

[20] http://linuxdrops.com/log-management-using-logstash-and-kibana-on-centos-rhel-fedora/#

(accessed 10.03.2013)

Page 58: Choosing an open source log management system for small business

58

[21] https://github.com/rashidkpc/Kibana/pull/261 (accessed 04.04.2013)

[22] https://gist.github.com/ (accessed 07.04.2013)

[23] https://code.google.com/p/enterprise-log-search-and-archive/ (accessed 18:03.2013)

[24] https://github.com (accessed 15.03.2013)

[25] http://technet.microsoft.com/en-us/library/cc794868%28v=ws.10%29.aspx (accessed

06.04.2013)

[26] http://linux.about.com/library/cmd/blcmdl1_time.htm (accessed 15.03.2013)

[27] http://netcat.sourceforge.net/(accessed 15.03.2013)

[28] http://htop.sourceforge.net/ (accessed 10.03.2013)

[29] https://code.google.com/p/enterprise-log-management-appliance/wiki/omelasticsearch

[30] https://github.com/rashidkpc/Kibana/issues/326 (accessed 07.04.2013)

[31] https://github.com/rashidkpc/Kibana/issues/310 (accessed 06.04.2013)

[32] https://www.phusionpassenger.com/ (accessed 12.04.2013)

[33] http://wiki.centos.org/Manuals/ReleaseNotes/CentOS6.4 (accessed 15.03.2013)

[35] http://semicomplete.com/presentations/logstash-puppetconf-2012/#/ (accessed

12.03.2013)

[36] http://kibana.org/infrastructure.html (accessed 25.03.2013)

[37] http://graylog2.com/about (accessed 06.03.2013)

[38] http://support.torch.sh/help/kb/graylog2-web-interface/message-search-syntax (accessed

06.03.2013)

Page 59: Choosing an open source log management system for small business

59

Appendices

Appendix - 1 Basic Event Log Cycle

[35]

Page 60: Choosing an open source log management system for small business

60

Appendix 2 - Logstash Inputs, Filters and Outputs

Inputs filters outputs

amqp alter amqp

drupal_dblog anonymize boundary

elasticsearch checksum circonus

eventlog clone cloudwatch

exec Csv datadog

file date elasticsearch

ganglia dns elasticsearch_http

gelf environment elasticsearch_river

gemfire gelfify email

generator geoip exec

graphite grep file

heroku grok ganglia

imap grokdiscovery gelf

irc json gemfire

log4j Kv graphite

lumberjack metrics graphtastic

lumberjack2 multiline hipchat

pipe mutate http

rabbitmq noop internal

redis ruby irc

relp sleep juggernaut

snmptrap split librato

sqs syslog_pri loggly

stdin translate lumberjack

stomp urldecode metriccatcher

syslog useragent mongodb

tcp xml nagios

twitter zeromq nagios_nsca

udp null

varnishlog opentsdb

websocket pagerduty

xmpp pipe

zenoss rabbitmq

zeromq redis

riak

riemann

sns

sqs

statsd

stdout

stomp

syslog

tcp

websocket

xmpp

zabbix

zeromq

[19]

Page 61: Choosing an open source log management system for small business

61

Appendix 3 - Rsyslog main components installation

RPMs for installing Rsyslog v7

#!/bin/sh

wget http://rpms.adiscon.com/v7-stable/epel-6/x86_64/RPMS/libee-devel-0.4.1-1.el6.x86_64.rpm

wget http://rpms.adiscon.com/v7-stable/epel-6/x86_64/RPMS/libee-0.4.1-1.el6.x86_64.rpm

wget http://rpms.adiscon.com/v7-stable/epel-6/x86_64/RPMS/json-c-0.9-4.el6.x86_64.rpm

wget http://rpms.adiscon.com/v7-stable/epel-6/x86_64/RPMS/json-c-devel-0.9-4.el6.x86_64.rpm

wget http://rpms.adiscon.com/v7-stable/epel-6/x86_64/RPMS/libestr-0.1.5-1.el6.x86_64.rpm

wget http://rpms.adiscon.com/v7-stable/epel-6/x86_64/RPMS/libestr-devel-0.1.5-

1.el6.x86_64.rpm

wget http://rpms.adiscon.com/v7-stable/epel-6/x86_64/RPMS/liblognorm-devel-0.3.4-

5.el6.x86_64.rpm

wget http://rpms.adiscon.com/v7-stable/epel-6/x86_64/RPMS/liblognorm-0.3.4-5.el6.x86_64.rpm

wget http://rpms.adiscon.com/v7-stable/epel-6/x86_64/RPMS/rsyslog-7.2.6-3.el6.x86_64.rpm

wget http://rpms.adiscon.com/v7-stable/epel-6/x86_64/RPMS/rsyslog-elasticsearch-7.2.6-

3.el6.x86_64.rpm

Installing from RPM

rpm -ivh libee-devel-0.4.1-1.el6.x86_64.rpm libee-0.4.1-1.el6.x86_64.rpm json-c-0.9-

4.el6.x86_64.rpm json-c-devel-0.9-4.el6.x86_64.rpm libestr-0.1.5-1.el6.x86_64.rpm libestr-devel-

0.1.5-1.el6.x86_64.rpm liblognorm-devel-0.3.4-5.el6.x86_64.rpm liblognorm-0.3.4-

5.el6.x86_64.rpm rsyslog-7.2.6-3.el6.x86_64.rpm rsyslog-elasticsearch-7.2.6-3.el6.x86_64.rpm

Page 62: Choosing an open source log management system for small business

62

Appendix 4 - Kibana setup example scheme

[36]

Page 63: Choosing an open source log management system for small business

63

Appendix 5 -TCP and UDP output options in Logstash

Options for Logstash TCP Input

input {

tcp {

add_field => ... # hash (optional), default: {}

charset => ... # string, one of [full list of supported character sets is available on the homepage]

data_timeout => ... # number (optional), default: -1

debug => ... # boolean (optional), default: false

format => ... # string, one of ["plain", "json", "json_event", "msgpack_event"] (optional)

host => ... # string (optional), default: "0.0.0.0"

message_format => ... # string (optional)

mode => ... # string, one of ["server", "client"] (optional), default: "server"

port => ... # number (required)

ssl_cacert => ... # a valid filesystem path (optional)

ssl_cert => ... # a valid filesystem path (optional)

ssl_enable => ... # boolean (optional), default: false

ssl_key => ... # a valid filesystem path (optional)

ssl_key_passphrase => ... # password (optional), default: nil

ssl_verify => ... # boolean (optional), default: false

tags => ... # array (optional)

type => ... # string (required)

}

}

Page 64: Choosing an open source log management system for small business

64

Options for Logstash UDP Input

input {

udp {

add_field => ... # hash (optional), default: {}

buffer_size => ... # number (optional), default: 8192

charset => ... # string, one of [full list of supported character sets is available on the homepage]

debug => ... # boolean (optional), default: false

format => ... # string, one of ["plain", "json", "json_event", "msgpack_event"] (optional)

host => ... # string (optional), default: "0.0.0.0"

message_format => ... # string (optional)

port => ... # number (optional), default: 9999

tags => ... # array (optional)

type => ... # string (required)

}

}

[19]

Page 65: Choosing an open source log management system for small business

65

Appendix 6 – Graylog2 setup example scheme

[37]

Page 66: Choosing an open source log management system for small business

66

Appendix 7 – Graylog2 tweaked settings

Graylog2.conf (this is a part of the configuration, other part is kept default)

is_master = true

plugin_dir = /opt/graylog2-server/plugin

syslog_listen_port = 514

syslog_listen_address = 0.0.0.0

syslog_enable_udp = true

syslog_enable_tcp = true

syslog_use_nul_delimiter = false

syslog_store_full_message = true

udp_recvbuffer_sizes = 1048576

elasticsearch_config_file = /etc/graylog2-elasticsearch.yml

elasticsearch_max_docs_per_index = 20000000

elasticsearch_index_prefix = graylog2

elasticsearch_max_number_of_indices = 20

elasticsearch_shards = 4

elasticsearch_replicas = 0

output_batch_size = 5000

processbuffer_processors = 1

outputbuffer_processors = 1

processor_wait_strategy = blocking

ring_size = 1024

mongodb_useauth = true

mongodb_user = grayloguser

mongodb_password = secret

mongodb_host = 127.0.0.1

#mongodb_replica_set = localhost:27017,localhost:27018,localhost:27019

mongodb_database = graylog2

mongodb_port = 27017

……….

Page 67: Choosing an open source log management system for small business

67

Appendix 8 – Graylog2, Kibana and ELSA component details

Graylog2 main components:

graylog2-server: 0.11.0

graylog2-web-interface: 0.11.0

Elasticsearch 20.6

java 1.7 java version "1.7.0_17" Java(TM) SE Runtime Environment (build 1.7.0_17-b02) Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

ruby 1.9 ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-linux]

mongo db MongoDB shell version: 2.4.1

httpd webserver Apache/2.2.15 (Unix) with passenger-3.0.19 module

Kibana main components:

Kibana version 7223de1

Elasticsearch 20.6

java 1.7 java version "1.7.0_17"

Java(TM) SE Runtime Environment (build 1.7.0_17-b02)

Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

ruby 1.9 ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-linux]

logstash-1.1.9-monolithic.jar and logstash-1.1.10-flatjar.jar (Logstash or Rsyslog)

rsyslog 7.2.7 (v7-stable)

httpd webserver Apache/2.2.15 (Unix) with passenger-3.0.19 module

ELSA main components:

ELSA node and ELSA web

Syslog-ng 3.2.4

MySQL Server version: 5.1.67 Source distribution Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

Sphinx 2.0.5-id64-release (r3308) Copyright (c) 2001-2012, Andrew Aksyonoff

Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)

perl, v5.10.1 (*) built for x86_64-linux-thread-multi

Copyright 1987-2009, Larry Wall

Page 68: Choosing an open source log management system for small business

68

Appendix 9 – Lucene search

Wildcards

There are two supported wildcard operators: ? for a single character and * for zero or more

characters.

If the Exception or Exzeption are both suitable search results then next query can be used:

Ex?eption

If the goal is foo1, foobar1 or foobaz1 this query should be used: foo*1

Note that a wildcard operator can't be used at the beginning of a term.

Fuzzy searches

You can do fuzzy searches which are using the Levenshtein Distance, or Edit Distance algorithm.

Just put a tilde (~) at the end of a term to perform a fuzzy search: roam~ would find for example

roam, foam and roams.

You can also specify a required similarity (between 0 and 1): roam~0.8

With a value closer to 1 only terms with a higher similarity will be matched. The default is 0.5

setup example scheme Graylog2 allows to search for words that are within a specified distance

away. Just use the tilde (~) at the end of a phrase: "Exception payment"~10 This will find all

messages with Exception and payment within a distance of 10 words to each other.

Boolean operators

By default all terms or phrases are combined with OR. custom conjunctions can be defined:

AND (&&), OR (||) or NOT (!). Note that these operators must be all uppercase.

Some examples:

Exception AND payment

Exception && payment

"Exception in payment subsystem" OR fatal

"Exception in payment subsystem" || fatal

"Exception in payment subsystem" AND "fatal error"

Page 69: Choosing an open source log management system for small business

69

Exception NOT fatal

Plus and minus signs can be also used to indicate if a term or phrase must be included or not:

+fatal error # must contain _fatal_, might contain _error_

"fatal Exception" -payment # must contain _fatal exception_, must not contain _payment_

Escaping

The following characters need to be escaped with a backslash \ if they are not meant to be part of

the search syntax:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

[38]

Page 70: Choosing an open source log management system for small business

70

Appendix 10 – Kibana search examples

As Kibana is using Apache Lucene syntax here are some of the examples searches that can be

done.

@source_host=192.168.1.110 AND magician AND 10.177.7.166

This query will show the logs originating from 192.168.1.110 source IP that contain string

“10.177.7.166” and “magician“ inside any message field. The more fields are parsed the more

specific the search query can be made and more possibilities there are to analyse the logs.

Here are 2 examples of similar logs sent from remote host to Kibana server.

Note that the source IPs are different.

<30>2013-04-18T08:13:30.133646+03:00 log sshd[7777]: Accepted

password for magician from 10.177.7.166 port 51757 ssh2

<30>2013-04-18T08:13:44.834626+03:00 log sshd[5876]: Accepted

password for magician from 10.177.7.177 port 52696 ssh2

These are 2 successful ssh login messages received from 192.168.1.110. It states that user

m@gician logged in via ssh from IP 10.177.7.166 and then 10.177.7.177 to 192.168.1.110

Here is how the message might look parsed into very basic fields:

@message <30>2013-04-18T08:13:30.133646+03:00 log sshd[7777]:

Accepted password for magician from 10.177.7.166 port 51757 ssh2

@source tcp://192.168.1.110:43021/

@source_host 192.168.1.110

@source_path /

@tags

@timestamp 2013-04-24T07:15:07.091Z

@type rsyslog

Here below is an example of an IETF syslog formatted test message presented to Kibana. This

message was parsed and indexed with next fields (starting with @) defined by the pattern included

in rsyslog.conf file.

@fields.facility kern

@fields.host myhost

@fields.msgtext this message is a test

@fields.receptiontime 2013-04-26T01:47:44.233921+03:00

@fields.severity info

@fields.tag kernel

Page 71: Choosing an open source log management system for small business

71

@message <6>1 2013-04-25T22:00:00Z myhost kernel - - - this

message is a test

@tags

@timestamp 2013-04-25T22:00:00Z

@type syslog

It is possible to apply some action to the query result using pipe:

@fields.host:log NOT @fields.facility:"user" AND NOT @fields.facility:"kern" | terms severity

Page 72: Choosing an open source log management system for small business

72

Appendix 11 – ELSA search examples

Here are some query examples in ELSA.

Queries can be very simple, like looking for any mention of an IP address:

10.0.20.1

Or a website

site:www.google.com

Here is an example query for finding Symantec Anti-Virus alerts on Windows logs on ten hosts

that does not contain the keyword “TrackingCookie”

+eventid:51 host>10.0.0.10 host<10.0.0.20 -TrackingCookie

One could also look for account lockouts that do not come from certain hosts:

+class:windows +locked -host>10.0.0.10 -host<10.0.0.20

To see what hosts have had lockout events, one could run:

+class:windows +”locked out”

and choose the ANY.host field from the “Report On” menu.

Page 73: Choosing an open source log management system for small business

73

Appendix 12 – ELSA performance test details

It was measured that it takes about 3 seconds to load 100 000 lines sent from netcat at localhost

through Syslog-ng into the text file. So the batch load from text file into MySQL was set to 5

seconds (keeping in mind human reaction and input latency).

Using the watch –n 1 ls –la command in the $DATA_DIR/elsa/tmp/buffers/

(/data/elsa/tmp/buffers/) it was possible to see when the batches were loaded. As soon as a batch

was loaded, the logs were sent and the stopwatch started, then the prepared search was refreshed

in web browser until the results appeared.

It took approximately 3 seconds until the text file was written by Syslog-ng from 100 000 log

lines sent. Based on this number a 5 second batch time interval was chosen for testing, with 2

extra seconds taking into account the latency of input and output and human reaction.

Once the batch is loaded into the MySQL database the raw file is deleted and replaced by a new

empty raw file.

After the logs are loaded from the raw file it is indexed within a moment. It takes about a second

more to return results for query “host=192.168.1.111” (around a million logs were returned by the

search results in this test and it showed query time around 1000 milliseconds)

According to the performance test it was proved that the batch gets 100 000 lines of logs loaded in

around one second. This is corresponding to the value declared in ELSA online documentation.